Model Description

This model was pretrained using the Unsloth framework. Only lm_head and embedding part trained. Approximately 15% of the Turkish Wikipedia dataset was used during training.(4 hour -A100)

Training Details

Hyperparameters

  • Epochs: 1
  • Per-device batch size: 32
  • Gradient accumulation steps: 2
  • Effective batch size: 64
  • Learning rate: 5e-5

Model Information

  • Developed by: AhmetSemih
  • License: Apache 2.0
Downloads last month
2
Safetensors
Model size
4B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AhmetSemih/gemma3-4b-freezed-pretrain_final

Finetuned
(1)
this model
Finetunes
1 model

Dataset used to train AhmetSemih/gemma3-4b-freezed-pretrain_final