Chimere MeZO LoRA — Qwen3.5-35B-A3B

LoRA adapter trained using MeZO (Memory-Efficient Zeroth-Order optimizer) on Qwen3.5-35B-A3B. MeZO computes gradient estimates via finite differences, requiring only inference-level memory — no backward pass, no gradient storage.

Key specs

Metric	Value
Method	MeZO (zeroth-order)
LoRA rank	4
Training pairs	47
Training time	54 seconds
Learning rate	1e-6
Epsilon	0.001
Final loss	14.96
File size	340 KB

Why this matters

LoRA training on 35B MoE models normally requires >32 GB VRAM (even in 4-bit, the backward pass + optimizer states exceed 16 GB). MeZO sidesteps this entirely — it trains at inference cost (~14 GB VRAM for IQ3_S). This adapter was produced on a single RTX 5060 Ti 16 GB in under a minute.

Limitations

Only 47 training pairs and 1 step — this is a proof-of-concept, not a production adapter
Loss 14.96 is high — more data and steps needed for meaningful quality improvement
Targets attention layers only (MoE experts frozen)

Usage

Load with PEFT or merge manually. Base model: Qwen/Qwen3.5-35B-A3B

chimere-odo/quality/nightly_lora.py — Nightly training pipeline
ramp-quant — Quantization pipeline

Author

Kevin Remondiere — Independent ML researcher, Oloron-Sainte-Marie, France

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kevletesteur/chimere-mezo-lora

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Adapter

(21)

this model

Kevletesteur
/

chimere-mezo-lora