nanochat: Build Your Own ChatGPT Clone
Minimal, from scratch, full-stack training/inference pipeline in ~8,000 lines of clean code
What is nanochat?
Full-Stack Pipeline
Complete training/inference pipeline from tokenizer to web UI in a single codebase
Minimal & Clean
~8,000 lines of clean code with dependency-minimal approach
Complete Training Stages
Tokenization
Train custom tokenizer using new Rust implementation
Pretraining
Pretrain Transformer LLM on FineWeb dataset with CORE evaluation
Midtraining
Train on user-assistant conversations and tool use
Advanced Training Features
Supervised Fine-Tuning (SFT)
Evaluate chat model on world knowledge multiple choice, math, code
Reinforcement Learning
Optional RL training with GRPO on GSM8K
Infrastructure & Deployment
Cloud-Ready
Boot up cloud GPU box and run single script
Efficient Inference
KV cache, prefill/decode, and tool use in lightweight sandbox
Performance Benchmarks
About ~12 hours surpasses GPT-2 CORE metric.
As you scale up towards ~$1000 (~41.6 hours of training), it becomes more coherent and can solve simple math/code problems and take multiple choice tests.
A depth 30 model trained for 24 hours gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc.
💰 Training Cost Breakdown
Basic ChatGPT Clone
$100 • ~4 hours
Surpasses GPT-2
~12hrs • GPT-2 Level
More Coherent
$1000 • ~41.6 hours
🎯 nanochat Goals & Features
Strong Baseline Stack
Get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo.
Research Potential
Has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it.
LLM101n Capstone
nanochat will be the capstone project of LLM101n (which is still being developed).
Report Cards
Write a single markdown report card, summarizing and gamifying the whole thing.
Repository Features
Capstone project of LLM101n (still being developed)
Potential to grow into research harness or benchmark, similar to nanoGPT
Cohesive, minimal, readable, hackable, maximally forkable repo
Not finished, tuned or optimized - likely quite a bit of low-hanging fruit
Generates single markdown report card, summarizing and gamifying the whole thing
Not finished, tuned or optimized - likely quite a bit of low-hanging fruit
Chris Prakoso
Augmented Humanity | Practical AI, Data & Analytics
Connect with me for cutting-edge AI/ML insights, hands-on LLM tutorials, and the latest in open-source machine learning development