Deep Learning Discussion: The Bigram Language Model

Part 2

beginner

code

Deep Learning

Author

Daniel Kick

Published

November 12, 2024

For Next Session:

Explore the Bigram Model – Try to improve on the Bigram model we wrote this week. How much is performance influenced by…
- Optimizer choice?
- Learning rate?
- Number and size of hidden layers?
- Batch size?
- Randomness of training set
- …
Next session we will be discussing attention and transformers. For next session:
- Get an introduction to transormers and attention with Grant Sanderson’s But what is a GTP 0h27 and Visualizing Attention 0h26.
- Then work through Generatively Pretrained Transformer 1h56 following along in a notebook.
- Finally (optionally) watch How might LLMs store facts 0h22.