Chimere MeZO LoRA β Qwen3.5-35B-A3B
LoRA adapter trained using MeZO (Memory-Efficient Zeroth-Order optimizer) on Qwen3.5-35B-A3B. MeZO computes gradient estimates via finite differences, requiring only inference-level memory β no backward pass, no gradient storage.
Key specs
| Metric | Value |
|---|---|
| Method | MeZO (zeroth-order) |
| LoRA rank | 4 |
| Training pairs | 47 |
| Training time | 54 seconds |
| Learning rate | 1e-6 |
| Epsilon | 0.001 |
| Final loss | 14.96 |
| File size | 340 KB |
Why this matters
LoRA training on 35B MoE models normally requires >32 GB VRAM (even in 4-bit, the backward pass + optimizer states exceed 16 GB). MeZO sidesteps this entirely β it trains at inference cost (~14 GB VRAM for IQ3_S). This adapter was produced on a single RTX 5060 Ti 16 GB in under a minute.
Limitations
- Only 47 training pairs and 1 step β this is a proof-of-concept, not a production adapter
- Loss 14.96 is high β more data and steps needed for meaningful quality improvement
- Targets attention layers only (MoE experts frozen)
Usage
Load with PEFT or merge manually. Base model: Qwen/Qwen3.5-35B-A3B
Related
- chimere-odo/quality/nightly_lora.py β Nightly training pipeline
- ramp-quant β Quantization pipeline
Author
Kevin Remondiere β Independent ML researcher, Oloron-Sainte-Marie, France
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support