Train your own language model with nanoGPT
Let’s build a songwriter
--
This morning, I watched Andrej Karpathy’s Build ChatGPT from Scratch video. I was so impressed. Only true legend can make such a complex model look so effortless. In his video, he builds a GPT language model from scratch with only a few hundred lines of code and organized everything in the nanoGPT Github repo. I can’t wait to give it a try. So in this blog post, I’m going to try out the nanoGPT and see if I use nanoGPT to train a songwriter.
Train a Shakespeare writer (following repo instructions)
Before we build our songwriter, let’s first follow the instructions on the nanoGPT repo to build a Shakespeare writer.
Step 1: Download Anaconda
We will first need to install Python. Downloading Anaconda is the easiest and recommended way to get your Python and the Conda environment management set up.
Step 2: Set up Conda environment
Let’s create a new Conda environment called “nanoGPT”:
conda create -n nanoGPT
Then we activate this environment and install the needed packages:
conda activate nanoGPT
conda install pytorch numpy transformers datasets tiktoken wandb tqdm pandas -c conda-forge
Step 3: Prepare training data
We download the Shakespeare text to an input.txt
file and select 90% of the text as training data and 10% of the text as validation data. The prepare.py
file also creates a train.bin
and val.bin
in the data directory for later use.
python data/shakespeare_char/prepare.py
Step 4: Train your model
On my Apple M1 computer, I tried the simplified version of the model. Note that I changed the device from cpu
to mps
to use its Metal GPU.
python train.py config/train_shakespeare_char.py — device=mps — compile=False — eval_iters=20 — log_interval=1 — block_size=64 — batch_size=12 — n_layer=4 — n_head=4 — n_embd=128 —…