Model Description
This model was pretrained using the Unsloth framework. Only lm_head and embedding part trained. Approximately 15% of the Turkish Wikipedia dataset was used during training.(4 hour -A100)
Training Details
Hyperparameters
- Epochs: 1
- Per-device batch size: 32
- Gradient accumulation steps: 2
- Effective batch size: 64
- Learning rate: 5e-5
Model Information
- Developed by: AhmetSemih
- License: Apache 2.0
- Downloads last month
- 2
Model tree for AhmetSemih/gemma3-4b-freezed-pretrain_final
Base model
google/gemma-3-4b-pt Finetuned
google/gemma-3-4b-it Finetuned
AhmetSemih/tr-gemma-128k-4b