Train your own language model with nanoGPT

Let’s build a songwriter

Sophia Yang, Ph.D.
6 min readMar 20


This morning, I watched Andrej Karpathy’s Build ChatGPT from Scratch video. I was so impressed. Only true legend can make such a complex model look so effortless. In his video, he builds a GPT language model from scratch with only a few hundred lines of code and organized everything in the nanoGPT Github repo. I can’t wait to give it a try. So in this blog post, I’m going to try out the nanoGPT and see if I use nanoGPT to train a songwriter.

Train a Shakespeare writer (following repo instructions)

Before we build our songwriter, let’s first follow the instructions on the nanoGPT repo to build a Shakespeare writer.

Step 1: Download Anaconda

We will first need to install Python. Downloading Anaconda is the easiest and recommended way to get your Python and the Conda environment management set up.

Step 2: Set up Conda environment

Let’s create a new Conda environment called “nanoGPT”:

conda create -n nanoGPT

Then we activate this environment and install the needed packages:

conda activate nanoGPT
conda install pytorch numpy transformers datasets tiktoken wandb tqdm pandas -c conda-forge

Step 3: Prepare training data

We download the Shakespeare text to an input.txt file and select 90% of the text as training data and 10% of the text as validation data. The file also creates a train.bin and val.bin in the data directory for later use.

python data/shakespeare_char/

Step 4: Train your model

On my Apple M1 computer, I tried the simplified version of the model. Note that I changed the device from cpu to mps to use its Metal GPU.

python config/ — device=mps — compile=False — eval_iters=20 — log_interval=1 — block_size=64 — batch_size=12 — n_layer=4 — n_head=4 — n_embd=128 —…



Sophia Yang, Ph.D.