Tiny-LLM 54M

A small transformer language model (~54.93M parameters) trained from scratch for educational and experimental purposes.

Model Description

This is a decoder-only transformer trained from scratch on Wikipedia text. It demonstrates that meaningful language models can be trained on consumer hardware with modest compute budgets.

Architecture

Component Value
Parameters 54.93M
Layers 12
Hidden Size 512
Attention Heads 8
Intermediate (FFN) 1408
Vocab Size 32,000
Max Sequence Length 512
Position Encoding RoPE
Normalization RMSNorm
Activation SwiGLU
Weight Tying Yes

Training Details

Parameter Value
Training Steps 50,000
Tokens ~100M
Batch Size 32
Learning Rate 3e-4
Warmup Steps 2,000
Weight Decay 0.1
Hardware NVIDIA RTX 5090 (32GB)
Training Time ~3 hours

Usage

import torch
from transformers import AutoTokenizer

# Load tokenizer (uses standard GPT-2 style tokenizer)
tokenizer = AutoTokenizer.from_pretrained("jonmabe/tiny-llm-54m")

# For custom model loading, see the model files
# This model uses a custom architecture - see scripts/ for inference code

Generation Example

# Note: This model uses a custom architecture
# Full inference code available in the repository

prompt = "The history of artificial intelligence"
# Model generates continuation based on learned Wikipedia patterns

Intended Use

  • Educational: Understanding transformer training from scratch
  • Experimental: Testing fine-tuning approaches on small models
  • Personal LLM: Base for personal voice/style fine-tuning
  • Research: Lightweight model for NLP experiments

Limitations

  • Small model size limits knowledge and capabilities
  • Trained only on Wikipedia - limited domain coverage
  • Not suitable for production use cases requiring high quality
  • May generate factually incorrect information
  • No RLHF or instruction tuning

Training Data

  • Source: Wikipedia (English)
  • Processing: Tokenized with 32K vocabulary SentencePiece tokenizer
  • Format: Standard causal language modeling (next token prediction)

Future Work

This model is intended as a base for:

  1. Personal Fine-tuning: Adapt to individual writing style using personal data
  2. Domain Adaptation: Specialize for specific topics or tasks
  3. Instruction Tuning: Add instruction-following capabilities

Hardware Requirements

  • Inference: ~300MB GPU memory, runs on any modern GPU or Apple Silicon
  • Fine-tuning: ~2GB GPU memory recommended

Related Work

Inspired by:

  • Andrej Karpathy's nanoGPT
  • Geddy Duke's small LLM experiments
  • LLaMA architecture design choices

Citation

@misc{tiny-llm-54m,
  author = {jonmabe},
  title = {Tiny-LLM: A 54M Parameter Language Model},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/jonmabe/tiny-llm-54m}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Spaces using jonmabe/tiny-llm-54m 3