Qwen3 0.6B SFT (DGX Spark)

A supervised fine-tuned version of Qwen3-0.6B trained on the no_robots dataset.

Training Details

This model was trained on an NVIDIA DGX Spark (GB10 Blackwell GPU) as part of testing open-instruct on the new hardware platform.

Parameter Value
Base model Qwen/Qwen3-0.6B
Dataset HuggingFaceH4/no_robots
Epochs 2
Batch size 32
Gradient accumulation 4
Learning rate 2e-5
Scheduler cosine
Max sequence length 1024
Precision bf16

Hardware

  • GPU: NVIDIA GB10 (Blackwell, sm_121)
  • Memory: 128GB unified CPU/GPU
  • CUDA: 13.0

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("natolambert/qwen3-dgx-spark-sft")
tokenizer = AutoTokenizer.from_pretrained("natolambert/qwen3-dgx-spark-sft")

messages = [{"role": "user", "content": "Write a short poem about machine learning."}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Script

Trained using the DGX Spark SFT script from open-instruct:

./scripts/train/dgx-spark/sft_no_robots.sh

See dgx-spark-setup for details on running ML training on DGX Spark.

Limitations

This is primarily a test model to validate the DGX Spark training pipeline. It has not been extensively evaluated for downstream tasks.

Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for natolambert/qwen3-dgx-spark-sft

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(605)
this model

Dataset used to train natolambert/qwen3-dgx-spark-sft