In a Training Loop 🔄

Gheorghe Chesler PRO

nightmedia

AI & ML interests

Nightmedia: human-Like AI and the MLX Deckard(qx) Formula Donations are appreciated: BTC:36d7U1n3MFaXgnNRAaEL3Pa3Hy6oFhM7XY

Recent Activity

updated a model about 1 hour ago

nightmedia/Qwen3.5-27B-Polaris-Advanced-Thinking-Alpha-mxfp4-mlx

updated a model about 1 hour ago

nightmedia/Qwen3.5-27B-GLM-4.7-Flash-Thinking-ALPHA-mxfp4-mlx

updated a model about 1 hour ago

nightmedia/Qwen3.5-27B-HERETIC-Polaris-Advanced-Thinking-Alpha-uncensored-mxfp4-mlx

View all activity

Organizations

updated 3 models about 1 hour ago

nightmedia/Qwen3.5-27B-Polaris-Advanced-Thinking-Alpha-mxfp4-mlx

Image-Text-to-Text • 6B • Updated about 1 hour ago • 12

nightmedia/Qwen3.5-27B-GLM-4.7-Flash-Thinking-ALPHA-mxfp4-mlx

Image-Text-to-Text • 6B • Updated 40 minutes ago • 7

nightmedia/Qwen3.5-27B-HERETIC-Polaris-Advanced-Thinking-Alpha-uncensored-mxfp4-mlx

Image-Text-to-Text • 6B • Updated about 1 hour ago • 13

replied to their post about 12 hours ago

Qwen3.5-0.8B

quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.351,0.501,0.733,0.462,0.348,0.682,0.573
q8-hi    0.363,0.501,0.777,0.466,0.364,0.695,0.548
q8       0.363,0.505,0.779,0.466,0.362,0.695,0.553
q6-hi    0.354,0.503,0.773,0.465,0.370,0.693,0.558
q6       0.357,0.503,0.769,0.462,0.370,0.695,0.543
mxfp4    0.339,0.489,0.738,0.433,0.330,0.672,0.553

tvall43/Qwen3.5-0.8B-Text-heretic
mxfp8    0.348,0.502,0.635,0.461,0.338,0.682,0.571
mxfp4    0.333,0.495,0.673,0.432,0.330,0.670,0.552

Old model performance

Qwen3-0.6B
bf16     0.298,0.354,0.378,0.415,0.344,0.649,0.534
q8-hi    0.296,0.355,0.378,0.416,0.348,0.652,0.529
q8       0.299,0.354,0.378,0.414,0.346,0.650,0.535
q6-hi    0.301,0.356,0.378,0.415,0.350,0.651,0.541
q6       0.300,0.367,0.378,0.416,0.344,0.647,0.524
mxfp4    0.286,0.364,0.609,0.404,0.316,0.626,0.531

Quant    Perplexity     Peak memory
mxfp8    6.611 ± 0.049  7.65 GB
mxfp4    7.455 ± 0.057  6.33 GB

updated 4 models about 12 hours ago

replied to their post about 12 hours ago

Qwen3.5-9B

         arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.417,0.458,0.623,0.634,0.338,0.737,0.639
mxfp4    0.419,0.472,0.622,0.634,0.352,0.739,0.644

updated 2 models about 12 hours ago

nightmedia/Qwen3.5-2B-mxfp4-mlx

0.7B • Updated about 1 hour ago • 202

nightmedia/Qwen3.5-2B-mxfp8-mlx

0.9B • Updated about 1 hour ago • 199

replied to their post about 13 hours ago

Test run times by model

0.8B-mxfp8   21:32
2B-mxfp8     41:25
4B-mxfp8   1:25:52
9B-mxfp8   2:33:18
27B-mxfp4  5:59:15
35B-A3B    1:47:22
122B-A10B  5:00:13

Every line shown in my metrics takes that long, on a M4Max. It will be half that speed on M3Max.

replied to their post about 13 hours ago

Straight quant metrics

Will be posted soon, both here and on the model card. They simply take longer, and I chose the most helpful path for other people that want to do merges or training and need to know the starting point. I noticed that merges are more successful when the models "align", and it's nice to know the starting point.

Running metrics

This is really simple

mlx_lm.evaluate --model Qwen3.5-9B-mxfp8-mlx --tasks winogrande boolq arc_challenge arc_easy hellaswag openbookqa piqa

With that you get a file that looks like this, and you pick the norm values. I will share my automation soon.

{
    "arc_challenge": {
        "alias": "arc_challenge",
        "acc,none": 0.3796928327645051,
        "acc_stderr,none": 0.014182119866974712,
        "acc_norm,none": 0.41723549488054607,
        "acc_norm_stderr,none": 0.014409825518403108
    },
    "arc_easy": {
        "alias": "arc_easy",
        "acc,none": 0.5227272727272727,
        "acc_stderr,none": 0.010249179090606133,
        "acc_norm,none": 0.4583333333333333,
        "acc_norm_stderr,none": 0.010224097209176468
    },
    "boolq": {
        "alias": "boolq",
        "acc,none": 0.6232415902140673,
        "acc_stderr,none": 0.008475244400491348
    },
    "hellaswag": {
        "alias": "hellaswag",
        "acc,none": 0.5109539932284406,
        "acc_stderr,none": 0.004988583820309417,
        "acc_norm,none": 0.6337382991435969,
        "acc_norm_stderr,none": 0.004807975515446552
    },
    "openbookqa": {
        "alias": "openbookqa",
        "acc,none": 0.198,
        "acc_stderr,none": 0.017838958963847292,
        "acc_norm,none": 0.338,
        "acc_norm_stderr,none": 0.021175665695209452
    },
    "piqa": {
        "alias": "piqa",
        "acc,none": 0.7486398258977149,
        "acc_stderr,none": 0.010121156016819219,
        "acc_norm,none": 0.7372143634385201,
        "acc_norm_stderr,none": 0.01026935406814087
    },
    "winogrande": {
        "alias": "winogrande",
        "acc,none": 0.6393054459352802,
        "acc_stderr,none": 0.01349606439423404
    }
}

replied to their post about 13 hours ago

Qwen3.5-4B

         arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.392,0.441,0.627,0.601,0.360,0.739,0.590
q6-hi    0.398,0.436,0.622,0.601,0.366,0.733,0.589
q6       0.392,0.437,0.622,0.604,0.372,0.736,0.590
mxfp4    0.371,0.444,0.632,0.585,0.356,0.732,0.548

Quant    Perplexity     Peak memory
mxfp8    4.953 ± 0.035  9.61 GB
mxfp4    5.209 ± 0.037  7.65 GB

tvall43/Qwen3.5-4B-Text-heretic
         arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.507,0.686,0.881,0.654,0.424,0.756,0.660
mxfp4    0.480,0.656,0.878,0.635,0.418,0.742,0.624

Qwen3-4B-Thinking-2507
mxfp4    0.381,0.408,0.686,0.516,0.364,0.701,0.585

Qwen3-4B-Instruct-2507
dwq5     0.449,0.588,0.843,0.458,0.394,0.697,0.556

updated 2 models about 14 hours ago

nightmedia/Qwen3.5-4B-mxfp4-mlx

1B • Updated 23 minutes ago • 340

nightmedia/Qwen3.5-4B-mxfp8-mlx

2B • Updated 23 minutes ago • 218

replied to their post about 17 hours ago

Qwen3.5-2B

quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.410,0.540,0.843,0.560,0.374,0.715,0.577
q8-hi    0.410,0.542,0.818,0.563,0.378,0.718,0.582
q8       0.411,0.539,0.819,0.563,0.378,0.718,0.577
q6-hi    0.404,0.542,0.821,0.560,0.372,0.715,0.575
q6       0.411,0.540,0.818,0.562,0.378,0.717,0.579
mxfp4    0.395,0.511,0.826,0.543,0.364,0.711,0.549

tvall43/Qwen3.5-2B-Text-heretic
mxfp8    0.412,0.547,0.832,0.560,0.382,0.713,0.582
mxfp4    0.403,0.508,0.808,0.542,0.354,0.711,0.563

DavidAU/Qwen3.5-2B-Polaris-HighIQ-Thinking-x4
mxfp8    0.478,0.688,0.842,0.553,0.402,0.722,0.600
mxfp4    0.430,0.621,0.826,0.544,0.378,0.723,0.585

Quant    Perplexity     Peak memory
mxfp8    5.558 ± 0.039  7.65 GB
mxfp4    6.073 ± 0.044  6.71 GB

posted an update about 18 hours ago

Post

164

Qwen3.5 Performance Metrics

With the 3.5 architecture, a lot of the old quanting methods don't work as before. I noticed this when benchmarking Deckard(qx) quants and by mistake ran a q8 that was better. That only happens if the qx sucked--and it did--enhancing layers just because they look interesting doesn't work anymore, so until I get a clear understanding of the architecture, I will publish mxfp4 and mxfp8 of the 3.5 models, that seem very stable and high performant

I will start posting here the metrics I gather from the series, starting with the smallest. If I have numbers from previous or similar models, I will post them in comparison

Qwen3.5-0.8B

quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.351,0.501,0.733,0.462,0.348,0.682,0.573
mxfp4    0.339,0.489,0.738,0.433,0.330,0.672,0.553

Old model performance

Qwen3-0.6B
bf16     0.298,0.354,0.378,0.415,0.344,0.649,0.534
q8-hi    0.296,0.355,0.378,0.416,0.348,0.652,0.529
q8       0.299,0.354,0.378,0.414,0.346,0.650,0.535
q6-hi    0.301,0.356,0.378,0.415,0.350,0.651,0.541
q6       0.300,0.367,0.378,0.416,0.344,0.647,0.524
mxfp4    0.286,0.364,0.609,0.404,0.316,0.626,0.531

Quant    Perplexity     Peak memory
mxfp8    6.611 ± 0.049  7.65 GB
mxfp4    7.455 ± 0.057  6.33 GB

Detailed metrics by model

nightmedia/Qwen3.5-0.8B-mxfp8-mlx

nightmedia/Qwen3.5-2B-mxfp8-mlx

nightmedia/Qwen3.5-4B-mxfp8-mlx

nightmedia/Qwen3.5-9B-mxfp8-mlx

nightmedia/Qwen3.5-27B-Text

nightmedia/Qwen3.5-122B-A10B-Text-mxfp4-mlx

More metrics coming soon.

I am running these on my Mac, an M4Max with 128GB RAM. Some performance numbers like tokens/second reflect the performance on my box.

This post will be updated with every model that gets tested. The larger models take hours, the 27B a couple days, so it will be a long process.

-G

6 replies

updated a collection about 23 hours ago

New Models

Collection

Quants created recently.. where time is relative • 108 items • Updated about 23 hours ago

Gheorghe Chesler PRO

AI & ML interests

Recent Activity

Organizations

nightmedia's activity