MeridianVector commited on
Commit
954b5eb
·
verified ·
1 Parent(s): 984495c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Jackrong/Qwopus3.5-9B-v3
3
+ tags: [qwen3.5, reasoning, grpo, dapo, dpo, self-correction, verifiable-rewards]
4
+ license: apache-2.0
5
+ language: [en]
6
+ pipeline_tag: text-generation
7
+ ---
8
+
9
+ # Qwopus3.5-9B-v4
10
+
11
+ ## Training Pipeline
12
+ ```
13
+ Jackrong/Qwopus3.5-9B-v3 (clean base, 87.8% HumanEval)
14
+ -> Phase 1: DAPO-GRPO 150 steps (native TRL)
15
+ loss=dapo, clip=[0.2, 0.28], beta=0, mask_truncated=True
16
+ Multiplicative reward: tags as entry fee, filler penalized
17
+ -> Phase 2: SAI-DPO Round 1 (self-generated pairs)
18
+ -> Phase 3: SAI-DPO Round 2 (hard example mining)
19
+ -> This Model
20
+ ```
21
+
22
+ ## Key Innovation: Multiplicative Rewards
23
+ - No tags + correct = score x 0.1 (format bad)
24
+ - Filler thinking = score x 0.15 - 0.5 (canned phrase penalty)
25
+ - Full tags + correct = score x 1.0 + bonuses (full credit)
26
+
27
+ ## DAPO Features (Native TRL)
28
+ - Token-level loss (no length bias)
29
+ - Asymmetric clipping (epsilon_high=0.28)
30
+ - Overlong filtering (mask_truncated=True)
31
+ - No KL penalty (beta=0)
32
+
33
+ ## Files
34
+ - `merged-model/` — Full merged safetensors
35
+
36
+ ## Usage
37
+ ```python
38
+ from transformers import AutoModelForCausalLM, AutoTokenizer
39
+ model = AutoModelForCausalLM.from_pretrained(
40
+ 'MeridianVector/Qwopus3.5-9B-v4', subfolder='merged-model',
41
+ torch_dtype='auto', device_map='auto', trust_remote_code=True)
42
+ tokenizer = AutoTokenizer.from_pretrained(
43
+ 'MeridianVector/Qwopus3.5-9B-v4', subfolder='merged-model', trust_remote_code=True)
44
+ ```
45
+
46
+ ## Acknowledgements
47
+ - [Jackrong](https://huggingface.co/Jackrong) for Qwopus3.5-9B-v3
48
+ - [Qwen Team](https://huggingface.co/Qwen) for Qwen3.5-9B
49
+
50
+ Trained on Google Colab G4 (RTX PRO 6000 Blackwell, 96GB).
51
+ Benchmarks pending.