nightmedia/Qwen3.5-27B-Polaris-Advanced-Thinking-Alpha-mxfp4-mlx
Image-Text-to-Text • 6B • Updated
• 12
Qwen3.5-0.8B
quant arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.351,0.501,0.733,0.462,0.348,0.682,0.573
q8-hi 0.363,0.501,0.777,0.466,0.364,0.695,0.548
q8 0.363,0.505,0.779,0.466,0.362,0.695,0.553
q6-hi 0.354,0.503,0.773,0.465,0.370,0.693,0.558
q6 0.357,0.503,0.769,0.462,0.370,0.695,0.543
mxfp4 0.339,0.489,0.738,0.433,0.330,0.672,0.553
tvall43/Qwen3.5-0.8B-Text-heretic
mxfp8 0.348,0.502,0.635,0.461,0.338,0.682,0.571
mxfp4 0.333,0.495,0.673,0.432,0.330,0.670,0.552
Old model performance
Qwen3-0.6B
bf16 0.298,0.354,0.378,0.415,0.344,0.649,0.534
q8-hi 0.296,0.355,0.378,0.416,0.348,0.652,0.529
q8 0.299,0.354,0.378,0.414,0.346,0.650,0.535
q6-hi 0.301,0.356,0.378,0.415,0.350,0.651,0.541
q6 0.300,0.367,0.378,0.416,0.344,0.647,0.524
mxfp4 0.286,0.364,0.609,0.404,0.316,0.626,0.531
Quant Perplexity Peak memory
mxfp8 6.611 ± 0.049 7.65 GB
mxfp4 7.455 ± 0.057 6.33 GB
Qwen3.5-9B
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.417,0.458,0.623,0.634,0.338,0.737,0.639
mxfp4 0.419,0.472,0.622,0.634,0.352,0.739,0.644
Test run times by model
0.8B-mxfp8 21:32
2B-mxfp8 41:25
4B-mxfp8 1:25:52
9B-mxfp8 2:33:18
27B-mxfp4 5:59:15
35B-A3B 1:47:22
122B-A10B 5:00:13
Every line shown in my metrics takes that long, on a M4Max. It will be half that speed on M3Max.
Straight quant metrics
Will be posted soon, both here and on the model card. They simply take longer, and I chose the most helpful path for other people that want to do merges or training and need to know the starting point. I noticed that merges are more successful when the models "align", and it's nice to know the starting point.
Running metrics
This is really simple
mlx_lm.evaluate --model Qwen3.5-9B-mxfp8-mlx --tasks winogrande boolq arc_challenge arc_easy hellaswag openbookqa piqa
With that you get a file that looks like this, and you pick the norm values. I will share my automation soon.
{
"arc_challenge": {
"alias": "arc_challenge",
"acc,none": 0.3796928327645051,
"acc_stderr,none": 0.014182119866974712,
"acc_norm,none": 0.41723549488054607,
"acc_norm_stderr,none": 0.014409825518403108
},
"arc_easy": {
"alias": "arc_easy",
"acc,none": 0.5227272727272727,
"acc_stderr,none": 0.010249179090606133,
"acc_norm,none": 0.4583333333333333,
"acc_norm_stderr,none": 0.010224097209176468
},
"boolq": {
"alias": "boolq",
"acc,none": 0.6232415902140673,
"acc_stderr,none": 0.008475244400491348
},
"hellaswag": {
"alias": "hellaswag",
"acc,none": 0.5109539932284406,
"acc_stderr,none": 0.004988583820309417,
"acc_norm,none": 0.6337382991435969,
"acc_norm_stderr,none": 0.004807975515446552
},
"openbookqa": {
"alias": "openbookqa",
"acc,none": 0.198,
"acc_stderr,none": 0.017838958963847292,
"acc_norm,none": 0.338,
"acc_norm_stderr,none": 0.021175665695209452
},
"piqa": {
"alias": "piqa",
"acc,none": 0.7486398258977149,
"acc_stderr,none": 0.010121156016819219,
"acc_norm,none": 0.7372143634385201,
"acc_norm_stderr,none": 0.01026935406814087
},
"winogrande": {
"alias": "winogrande",
"acc,none": 0.6393054459352802,
"acc_stderr,none": 0.01349606439423404
}
}
Qwen3.5-4B
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.392,0.441,0.627,0.601,0.360,0.739,0.590
q6-hi 0.398,0.436,0.622,0.601,0.366,0.733,0.589
q6 0.392,0.437,0.622,0.604,0.372,0.736,0.590
mxfp4 0.371,0.444,0.632,0.585,0.356,0.732,0.548
Quant Perplexity Peak memory
mxfp8 4.953 ± 0.035 9.61 GB
mxfp4 5.209 ± 0.037 7.65 GB
tvall43/Qwen3.5-4B-Text-heretic
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.507,0.686,0.881,0.654,0.424,0.756,0.660
mxfp4 0.480,0.656,0.878,0.635,0.418,0.742,0.624
Qwen3-4B-Thinking-2507
mxfp4 0.381,0.408,0.686,0.516,0.364,0.701,0.585
Qwen3-4B-Instruct-2507
dwq5 0.449,0.588,0.843,0.458,0.394,0.697,0.556
Qwen3.5-2B
quant arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.410,0.540,0.843,0.560,0.374,0.715,0.577
q8-hi 0.410,0.542,0.818,0.563,0.378,0.718,0.582
q8 0.411,0.539,0.819,0.563,0.378,0.718,0.577
q6-hi 0.404,0.542,0.821,0.560,0.372,0.715,0.575
q6 0.411,0.540,0.818,0.562,0.378,0.717,0.579
mxfp4 0.395,0.511,0.826,0.543,0.364,0.711,0.549
tvall43/Qwen3.5-2B-Text-heretic
mxfp8 0.412,0.547,0.832,0.560,0.382,0.713,0.582
mxfp4 0.403,0.508,0.808,0.542,0.354,0.711,0.563
DavidAU/Qwen3.5-2B-Polaris-HighIQ-Thinking-x4
mxfp8 0.478,0.688,0.842,0.553,0.402,0.722,0.600
mxfp4 0.430,0.621,0.826,0.544,0.378,0.723,0.585
Quant Perplexity Peak memory
mxfp8 5.558 ± 0.039 7.65 GB
mxfp4 6.073 ± 0.044 6.71 GB
quant arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.351,0.501,0.733,0.462,0.348,0.682,0.573
mxfp4 0.339,0.489,0.738,0.433,0.330,0.672,0.553
Old model performance
Qwen3-0.6B
bf16 0.298,0.354,0.378,0.415,0.344,0.649,0.534
q8-hi 0.296,0.355,0.378,0.416,0.348,0.652,0.529
q8 0.299,0.354,0.378,0.414,0.346,0.650,0.535
q6-hi 0.301,0.356,0.378,0.415,0.350,0.651,0.541
q6 0.300,0.367,0.378,0.416,0.344,0.647,0.524
mxfp4 0.286,0.364,0.609,0.404,0.316,0.626,0.531
Quant Perplexity Peak memory
mxfp8 6.611 ± 0.049 7.65 GB
mxfp4 7.455 ± 0.057 6.33 GB