Chimere MeZO LoRA β€” Qwen3.5-35B-A3B

LoRA adapter trained using MeZO (Memory-Efficient Zeroth-Order optimizer) on Qwen3.5-35B-A3B. MeZO computes gradient estimates via finite differences, requiring only inference-level memory β€” no backward pass, no gradient storage.

Key specs

Metric Value
Method MeZO (zeroth-order)
LoRA rank 4
Training pairs 47
Training time 54 seconds
Learning rate 1e-6
Epsilon 0.001
Final loss 14.96
File size 340 KB

Why this matters

LoRA training on 35B MoE models normally requires >32 GB VRAM (even in 4-bit, the backward pass + optimizer states exceed 16 GB). MeZO sidesteps this entirely β€” it trains at inference cost (~14 GB VRAM for IQ3_S). This adapter was produced on a single RTX 5060 Ti 16 GB in under a minute.

Limitations

  • Only 47 training pairs and 1 step β€” this is a proof-of-concept, not a production adapter
  • Loss 14.96 is high β€” more data and steps needed for meaningful quality improvement
  • Targets attention layers only (MoE experts frozen)

Usage

Load with PEFT or merge manually. Base model: Qwen/Qwen3.5-35B-A3B

Related

Author

Kevin Remondiere β€” Independent ML researcher, Oloron-Sainte-Marie, France

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Kevletesteur/chimere-mezo-lora

Adapter
(21)
this model