github.com/karpathy/nanoGPT
2 Users
0 Comments
13 Highlights
0 Notes
Tags
Top Highlights
The simplest, fastest repository for training/finetuning medium-sized GPTs. It is a rewrite of minGPT that prioritizes teeth over education. Still under active development, but currently the file train.py reproduces GPT-2 (124M) on OpenWebText, running on a single 8XA100 40GB node in 38 hours of training. The code itself is plain and readable: train.py is a ~300-line boilerplate training loop and model.py a ~300-line GPT model definition, which can optionally load the GPT-2 weights from OpenAI. That's it.
karpathy / nanoGPT
character-level
Because the code is so simple, it is very easy to hack to your needs, train new models from scratch, or finetune pretrained checkpoints (e.g. biggest one currently available as a starting point would be the GPT-2 1.3B model from OpenAI).
medium-sized GPTs
GPT-2 (124M) on OpenWebText, running on a single 8XA100 40GB node in about 4 days of training.
works of Shakespeare.
lol ¯\_(ツ)_/¯. Not bad for a character-level model after 3 minutes of training on a GPU.
If you peak inside it,
Better results are quite likely obtainable by instead finetuning a pretrained GPT-2 model on this dataset (see finetuning section later).
Not bad for ~3 minutes on a CPU, for a hint of the right character gestalt. If you're willing to wait longer free to tune the hyperparameters, increase the size of the network, the context length (--block_size), the length of training, etc.
reproducing GPT-2
finetuning
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.