Tiny-LLM 54M

A small transformer language model (~54.93M parameters) trained from scratch for educational and experimental purposes.

Model Description

This is a decoder-only transformer trained from scratch on Wikipedia text. It demonstrates that meaningful language models can be trained on consumer hardware with modest compute budgets.

Architecture

Component	Value
Parameters	54.93M
Layers	12
Hidden Size	512
Attention Heads	8
Intermediate (FFN)	1408
Vocab Size	32,000
Max Sequence Length	512
Position Encoding	RoPE
Normalization	RMSNorm
Activation	SwiGLU
Weight Tying	Yes

Training Details

Parameter	Value
Training Steps	50,000
Tokens	~100M
Batch Size	32
Learning Rate	3e-4
Warmup Steps	2,000
Weight Decay	0.1
Hardware	NVIDIA RTX 5090 (32GB)
Training Time	~3 hours

Usage

import torch
from transformers import AutoTokenizer

# Load tokenizer (uses standard GPT-2 style tokenizer)
tokenizer = AutoTokenizer.from_pretrained("jonmabe/tiny-llm-54m")

# For custom model loading, see the model files
# This model uses a custom architecture - see scripts/ for inference code

Generation Example

# Note: This model uses a custom architecture
# Full inference code available in the repository

prompt = "The history of artificial intelligence"
# Model generates continuation based on learned Wikipedia patterns

Intended Use

Educational: Understanding transformer training from scratch
Experimental: Testing fine-tuning approaches on small models
Personal LLM: Base for personal voice/style fine-tuning
Research: Lightweight model for NLP experiments

Limitations

Small model size limits knowledge and capabilities
Trained only on Wikipedia - limited domain coverage
Not suitable for production use cases requiring high quality
May generate factually incorrect information
No RLHF or instruction tuning

Training Data

Source: Wikipedia (English)
Processing: Tokenized with 32K vocabulary SentencePiece tokenizer
Format: Standard causal language modeling (next token prediction)

Future Work

This model is intended as a base for:

Personal Fine-tuning: Adapt to individual writing style using personal data
Domain Adaptation: Specialize for specific topics or tasks
Instruction Tuning: Add instruction-following capabilities

Hardware Requirements

Inference: ~300MB GPU memory, runs on any modern GPU or Apple Silicon
Fine-tuning: ~2GB GPU memory recommended

Related Work

Inspired by:

Andrej Karpathy's nanoGPT
Geddy Duke's small LLM experiments
LLaMA architecture design choices

Citation

@misc{tiny-llm-54m,
  author = {jonmabe},
  title = {Tiny-LLM: A 54M Parameter Language Model},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/jonmabe/tiny-llm-54m}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

jonmabe
/

tiny-llm-54m