๐Ÿง  nanoGPT โ€” TinyStories Character-Level Model

A compact character-level GPT model trained on the TinyStories dataset using Karpathy's nanoGPT.

Model Details

Parameter Value
Parameters 2,723,712
Layers 6
Heads 6
Embedding Dim 192
Context Length 256
Vocab Size 93 (character-level)
Training Iters 2000
dtype float16

Usage

import torch, json
from model import GPTConfig, GPT

# Load config
with open('config.json') as f:
    cfg = json.load(f)

# Build model
conf = GPTConfig(
    vocab_size=cfg['vocab_size'], block_size=cfg['block_size'],
    n_layer=cfg['n_layer'], n_head=cfg['n_head'], n_embd=cfg['n_embd'],
    dropout=cfg['dropout'], bias=cfg['bias']
)
model = GPT(conf)
model.load_state_dict(torch.load('pytorch_model.bin', map_location='cpu'))
model.eval()

# Tokenize & generate
from char_tokenizer import encode, decode
prompt = "Once upon a time"
ids = torch.tensor([encode(prompt)], dtype=torch.long)
out = model.generate(ids, max_new_tokens=200, temperature=0.8, top_k=40)
print(decode(out[0].tolist()))

Training

Trained on Google Colab (T4 GPU) for ~10 minutes. Dataset: First ~20 MB of TinyStories-V2-GPT4-train.

License

MIT

Downloads last month
443
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support