opensourceprojects.dev

A broadsheet for software that doesn't ask for your email

nanoGPT: a clean, hackable 300-line GPT training loop that reproduces GPT-2 (124...

nanoGPT: a clean, hackable 300-line GPT training loop that reproduces GPT-2 (124...

GitHub RepoImpressions670

Project Description

View on GitHub

nanoGPT: A 300-Line GPT Training Loop You Can Actually Read

If you've ever stared at a massive transformer codebase and wondered "where does the actual training happen?", you're not alone. Most LLM training code is buried under layers of abstractions, distributed strategies, and framework-specific boilerplate. Andrej Karpathy's nanoGPT strips that all away.

It's a clean, minimal, fully functional GPT training loop in about 300 lines of Python. And yes, it reproduces GPT-2 (124M parameters) on OpenWebText.

What It Does

nanoGPT is a minimalist implementation of a GPT-style transformer for training from scratch. The core training script is just train.py - one file, no deep dependencies beyond PyTorch. You feed it text data, it trains a language model with the same architecture and tokenizer as GPT-2.

The 124M parameter version matches the original GPT-2 small configuration. You can scale it up or down by changing a config dict. The repo includes data preparation scripts for OpenWebText, Shakespeare, and the works of Leo Tolstoy.

Why It's Cool

Three things make this stand out:

Hackability. The entire training loop lives in one file. You can read it top to bottom in an afternoon. Want to add gradient checkpointing? Change the learning rate schedule? Swap in a different attention mechanism? You can find the exact lines to modify without spelunking through five abstraction layers.

Reproducibility. It not only trains, it actually works. The 124M model achieves validation loss close to the original GPT-2. You're not just running a toy - you're reproducing real results from a paper using code that fits in a single scroll.

Pedagogical value. This is probably the best example code for understanding how transformer training really works under the hood. The comments are minimal but exactly where you need them. The code is explicit, not clever. It's written to be understood, not to be the shortest possible solution.

How to Try It

You need PyTorch and a GPU (even a modest one works for small models). On a single A100 40GB, the 124M model takes about 4 days to train on OpenWebText. But you can start in minutes:

# Clone and install
git clone https://github.com/karpathy/nanoGPT.git
cd nanoGPT
pip install torch numpy tiktoken wandb tqdm

# Prepare data (Shakespeare is tiny, great for testing)
python data/shakespeare/prepare.py

# Train a small model
python train.py config/train_shakespeare.py

The repo also has a sample.py script to generate text from your trained model. Start with Shakespeare. It trains fast enough to see real results in under an hour on a consumer GPU.

Final Thoughts

nanoGPT isn't trying to be production infrastructure. It's not going to replace your distributed training pipeline. But that's the point. It exists to be read, understood, and modified. If you want to learn how GPTs actually work under the hood, or if you need a clean starting point for an experiment, this is the repo you want.

Sometimes the best code is the code you can fully understand. nanoGPT delivers exactly that.


Found this on @githubprojects

Back to Projects
Project ID: 39e8a665-ff59-4158-b8d3-4d646574da7aLast updated: June 26, 2026 at 04:12 AM