๐ง nanoGPT โ TinyStories Character-Level Model
A compact character-level GPT model trained on the TinyStories dataset using Karpathy's nanoGPT.
Model Details
| Parameter | Value |
|---|---|
| Parameters | 2,723,712 |
| Layers | 6 |
| Heads | 6 |
| Embedding Dim | 192 |
| Context Length | 256 |
| Vocab Size | 93 (character-level) |
| Training Iters | 2000 |
| dtype | float16 |
Usage
import torch, json
from model import GPTConfig, GPT
# Load config
with open('config.json') as f:
cfg = json.load(f)
# Build model
conf = GPTConfig(
vocab_size=cfg['vocab_size'], block_size=cfg['block_size'],
n_layer=cfg['n_layer'], n_head=cfg['n_head'], n_embd=cfg['n_embd'],
dropout=cfg['dropout'], bias=cfg['bias']
)
model = GPT(conf)
model.load_state_dict(torch.load('pytorch_model.bin', map_location='cpu'))
model.eval()
# Tokenize & generate
from char_tokenizer import encode, decode
prompt = "Once upon a time"
ids = torch.tensor([encode(prompt)], dtype=torch.long)
out = model.generate(ids, max_new_tokens=200, temperature=0.8, top_k=40)
print(decode(out[0].tolist()))
Training
Trained on Google Colab (T4 GPU) for ~10 minutes. Dataset: First ~20 MB of TinyStories-V2-GPT4-train.
License
MIT
- Downloads last month
- 443