Deep Learning Discussion: The Bigram Language Model
Part 2
beginner
code
Deep Learning
For Next Session:
Explore the Bigram Model – Try to improve on the Bigram model we wrote this week. How much is performance influenced by…
Optimizer choice?
Learning rate?
Number and size of hidden layers?
Batch size?
Randomness of training set
…
Next session we will be discussing attention and transformers. For next session:
Get an introduction to transormers and attention with Grant Sanderson’s But what is a GTP 0h27 and Visualizing Attention 0h26.
Then work through Generatively Pretrained Transformer 1h56 following along in a notebook.
Finally (optionally) watch How might LLMs store facts 0h22.
Materials
- Here is the notebook we finished writing today (Demo Notebook Part 2).