scaled down version of the Gemma architecture trained on the Tiny Shakespeare dataset.

Model

  • Architecture: Gemma (Transformer Decoder)
  • Attention: Multi Query Attention (MQA)
  • Hidden Size: 768
  • Number of Layers: 12
  • Number of Query Heads: 2
  • Number of KV Heads: 1
  • Sequence Length: 128 (Block Size)
  • Vocabulary Size: 65 (Character-level encoding)
  • Total Training Steps: 3,500

Architecture

  1. RMSNorm
  2. GeGLU
  3. RoPE
  4. Embedding Scaling

Usage

You can load this model directly using the Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("parkneurals/Gemma")

# Note: This model uses a custom character-level tokenizer. 
# You can use the provided char_map.json for encoding/decoding.

This model has slow inference due to rotation matrix calcuation on every layer for each token(as I made it only for learning purposes; please bear if anyone using)

Downloads last month
458
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train parkneurals/Gemma