Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: Jackrong/Qwopus3.5-9B-v3
|
| 3 |
+
tags: [qwen3.5, reasoning, grpo, dapo, dpo, self-correction, verifiable-rewards]
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
language: [en]
|
| 6 |
+
pipeline_tag: text-generation
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# Qwopus3.5-9B-v4
|
| 10 |
+
|
| 11 |
+
## Training Pipeline
|
| 12 |
+
```
|
| 13 |
+
Jackrong/Qwopus3.5-9B-v3 (clean base, 87.8% HumanEval)
|
| 14 |
+
-> Phase 1: DAPO-GRPO 150 steps (native TRL)
|
| 15 |
+
loss=dapo, clip=[0.2, 0.28], beta=0, mask_truncated=True
|
| 16 |
+
Multiplicative reward: tags as entry fee, filler penalized
|
| 17 |
+
-> Phase 2: SAI-DPO Round 1 (self-generated pairs)
|
| 18 |
+
-> Phase 3: SAI-DPO Round 2 (hard example mining)
|
| 19 |
+
-> This Model
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
## Key Innovation: Multiplicative Rewards
|
| 23 |
+
- No tags + correct = score x 0.1 (format bad)
|
| 24 |
+
- Filler thinking = score x 0.15 - 0.5 (canned phrase penalty)
|
| 25 |
+
- Full tags + correct = score x 1.0 + bonuses (full credit)
|
| 26 |
+
|
| 27 |
+
## DAPO Features (Native TRL)
|
| 28 |
+
- Token-level loss (no length bias)
|
| 29 |
+
- Asymmetric clipping (epsilon_high=0.28)
|
| 30 |
+
- Overlong filtering (mask_truncated=True)
|
| 31 |
+
- No KL penalty (beta=0)
|
| 32 |
+
|
| 33 |
+
## Files
|
| 34 |
+
- `merged-model/` — Full merged safetensors
|
| 35 |
+
|
| 36 |
+
## Usage
|
| 37 |
+
```python
|
| 38 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 39 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 40 |
+
'MeridianVector/Qwopus3.5-9B-v4', subfolder='merged-model',
|
| 41 |
+
torch_dtype='auto', device_map='auto', trust_remote_code=True)
|
| 42 |
+
tokenizer = AutoTokenizer.from_pretrained(
|
| 43 |
+
'MeridianVector/Qwopus3.5-9B-v4', subfolder='merged-model', trust_remote_code=True)
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
## Acknowledgements
|
| 47 |
+
- [Jackrong](https://huggingface.co/Jackrong) for Qwopus3.5-9B-v3
|
| 48 |
+
- [Qwen Team](https://huggingface.co/Qwen) for Qwen3.5-9B
|
| 49 |
+
|
| 50 |
+
Trained on Google Colab G4 (RTX PRO 6000 Blackwell, 96GB).
|
| 51 |
+
Benchmarks pending.
|