DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
Paper • 2406.11617 • Published • 10
⚠️ Note: This model requires ChatML chat template.
This is a merge of pre-trained language models created using mergekit.
The model is partially censored but can be jailbroken or ablated if needed.
This model was merged using the DELLA merge method using IntervitensInc/Mistral-Nemo-Base-2407-chatml as a base.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
base_model: B:/12B/models--IntervitensInc--Mistral-Nemo-Base-2407-chatml
models:
- model: B:/12B/models--IntervitensInc--Mistral-Nemo-Base-2407-chatml
- model: B:/12B/models--ChaoticNeutrals--Mag-Mell-Reasoner-12B
parameters:
density: 0.9
weight: 0.2
epsilon: 0.099
- model: B:/12B/models--Epiculous--Azure_Dusk-v0.2
parameters:
density: 0.9
weight: 0.2
epsilon: 0.099
- model: B:/12B/models--Epiculous--Crimson_Dawn-v0.2
parameters:
density: 0.9
weight: 0.2
epsilon: 0.099
- model: B:/12B/models--Epiculous--Violet_Twilight-v0.2
parameters:
density: 0.9
weight: 0.2
epsilon: 0.099
- model: B:/12B/models--GreenerPastures--Golden-Curry-12B
parameters:
density: 0.9
weight: 0.2
epsilon: 0.099
- model: B:/12B/models--inflatebot--MN-12B-Mag-Mell-R1
parameters:
density: 0.9
weight: 0.2
epsilon: 0.099
- model: B:/12B/models--PygmalionAI--Eleusis-12B
parameters:
density: 0.9
weight: 0.2
epsilon: 0.099
- model: B:/12B/models--PygmalionAI--Pygmalion-3-12B
parameters:
density: 0.9
weight: 0.2
epsilon: 0.099
- model: B:/12B/models--Sao10K--MN-12B-Lyra-v2a1
parameters:
density: 0.9
weight: 0.2
epsilon: 0.099
merge_method: della
parameters:
lambda: 1.0
normalize: false
int8_mask: false
dtype: bfloat16
tokenizer:
source: "union"
tokens:
# Force ChatML EOS tokens
"<|im_start|>":
source: "B:/12B/models--IntervitensInc--Mistral-Nemo-Base-2407-chatml"
force: true
"<|im_end|>":
source: "B:/12B/models--IntervitensInc--Mistral-Nemo-Base-2407-chatml"
force: true
chat_template: "chatml"
name: 🦁 MagMalion-Twilight-12B-v1