Gheorghe Chesler PRO

nightmedia

AI & ML interests

Nightmedia: human-Like AI and the MLX Deckard(qx) Formula Donations are appreciated: BTC:36d7U1n3MFaXgnNRAaEL3Pa3Hy6oFhM7XY

Recent Activity

updated a model about 2 hours ago

nightmedia/Qwen3.5-9B-TNG-PKD-Qwopus-Writer-MarkTwain-qx86-hi-mlx

published a model about 3 hours ago

nightmedia/Qwen3.5-9B-TNG-PKD-Qwopus-Writer-MarkTwain-qx86-hi-mlx

updated a model about 5 hours ago

nightmedia/Qwen3.5-9B-Fable-MarkTwain-qx86-hi-mlx

View all activity

Organizations

Posts 7

Post

983

Gemma4 template issues

I ran into this when testing the juiceb0xc0de/locus-gemma-4-e2b, and the response was full of end tags:

> I await your next query, G.<turn|><turn|>><turn|>Your formal request has been processed and analyzed. I am ready to continue the engagement when you are.<turn|>>

It turns out, the fix is in LMStudio:

Why this is critical for Gemma 4 E2B

Gemma 4 models (especially edge variants like E2B-it) utilize Chain-of-Thought thinking layers and structural multi-turn tool schemas natively. The model shifts between channels like <|channel>thought and regular dialogue text seamlessly.

If LM Studio does not explicitly watch for the terminal closing tags as a hard cutoff signal, the edge quantization layers will drop character alignment when processing long context history segments. As a result, the model gets stuck in an infinite state loop, repeating structure summaries instead of returning control back to your prompt session.

cat ~/.lmstudio/config-presets/gemma4.preset.json 
{
  "identifier": "@local:gemma4",
  "name": "gemma4",
  "changed": false,
  "operation": {
    "fields": [
      {
        "key": "llm.prediction.stopStrings",
        "value": [
          "<turn|>",
          "<channel|>",
          "<eos>"
        ]
      }
    ]
  },
  "load": {
    "fields": []
  }
}

That is the exact configuration structure LM Studio requires.

By saving it inside the llm.prediction.stopStrings operational field, LM Studio binds those terminal tokens directly into the underlying runtime client loop rather than the model's architectural blueprint. Every time you load this model profile, the inference wrapper will strictly police and discard those boundary markers before the streaming text token buffer writes to your chat window.

You have successfully stabilized a cutting-edge, programmatic Gemma 4 architecture inside a local GUI environment.

--Gemini

Post

1327

IBM Granite 4.1 series

New models came up, here is how they compare to models in the same size:

Brainwaves

arc   arc/e boolq hswag obkqa piqa  wino
granite-4.1-30b
mxfp8    0.456,0.572,0.897,0.621,0.444,0.757,0.616
mxfp4    0.453,0.565,0.892,0.624,0.442,0.759,0.585
qx86-hi  0.451,0.568,0.897,0.636,0.440,0.763,0.598

granite-4.1-8b
mxfp8    0.486,0.666,0.875,0.636,0.450,0.766,0.631

granite-4.1-3b
mxfp8    0.406,0.581,0.821,0.484,0.434,0.712,0.559

Gemma-4

quant    arc   arc/e boolq hswag obkqa piqa  wino
gemma-4-E4B-it
mxfp8    0.480,0.656,0.797,0.608,0.400,0.755,0.665
mxfp4    0.455,0.607,0.851,0.585,0.402,0.744,0.651

gemma-4-E2B-it
mxfp8    0.376,0.464,0.743,0.490,0.378,0.709,0.622
mxfp4    0.380,0.451,0.762,0.494,0.374,0.699,0.594

Qwen3.5

quant    arc   arc/e boolq hswag obkqa piqa  wino
Qwen3.5-9B
mxfp8    0.417,0.458,0.623,0.634,0.338,0.737,0.639
mxfp4    0.419,0.472,0.622,0.634,0.352,0.739,0.644

Qwen3.5-4B
mxfp8    0.392,0.441,0.627,0.601,0.360,0.739,0.590
mxfp4    0.371,0.444,0.632,0.585,0.356,0.732,0.548

Right out of the gate, IBM delivered models with better starting metrics than both Gemma and Qwen. Training these should be fun :)

Here is the Nightmedia collection of Granite models

https://huggingface.co/collections/nightmedia/ibm-granite-41

-G

View all Posts

Collections 28

View 28 collections

models 518

datasets 0

None public yet

Gheorghe Chesler PRO

AI & ML interests

Recent Activity

Organizations

Posts 7

Collections 28

nightmedia/granite-4.1-30b-mxfp8-mlx

nightmedia/granite-4.1-30b-mxfp4-mlx

nightmedia/granite-4.1-8b-mxfp8-mlx

nightmedia/granite-4.1-3b-mxfp8-mlx

nightmedia/Qwen3.6-35B-A3B-qx86-hi-mlx

nightmedia/Qwen3.6-35B-A3B-Text-qx64-mlx

nightmedia/Qwen3.6-35B-A3B-Holo3-mxfp8-mlx

nightmedia/Qwen3.6-35B-A3B-Architect-Qwopus-mxfp8-mlx

nightmedia/granite-4.1-30b-mxfp8-mlx

nightmedia/granite-4.1-30b-mxfp4-mlx

nightmedia/granite-4.1-8b-mxfp8-mlx

nightmedia/granite-4.1-3b-mxfp8-mlx

nightmedia/Qwen3.6-35B-A3B-qx86-hi-mlx

nightmedia/Qwen3.6-35B-A3B-Text-qx64-mlx

nightmedia/Qwen3.6-35B-A3B-Holo3-mxfp8-mlx

nightmedia/Qwen3.6-35B-A3B-Architect-Qwopus-mxfp8-mlx

models 518

nightmedia/Qwen3.5-9B-TNG-PKD-Qwopus-Writer-MarkTwain-qx86-hi-mlx

nightmedia/Qwen3.5-9B-Fable-MarkTwain-qx86-hi-mlx

nightmedia/Qwen3.6-35B-A3B-Fable-Holo3-Qwopus

nightmedia/Qwen3.6-35B-A3B-Fable-Holo3-Qwopus-mxfp4-mlx

nightmedia/Qwen3.6-35B-A3B-Fable-Holo3-Qwopus-qx64-hi-mlx

nightmedia/Qwen3.6-35B-A3B-Fable-Holo3-Qwopus-mxfp8-mlx

nightmedia/Qwen3.6-35B-A3B-Fable-Holo3-Qwopus-qx86-hi-mlx

nightmedia/Qwen3.6-35B-A3B-Fable-5-Distill-qx86-hi-mlx

nightmedia/Qwen3.5-9B-TNG-PKD-Qwopus-Coder-Fable-Polaris-Writer-V4-qx86-hi-mlx

nightmedia/Qwen3.5-9B-TNG-PKD-Qwopus-Coder-Qwythos-qx86-hi-mlx

datasets 0

Gheorghe Chesler PRO

AI & ML interests

Recent Activity

Organizations

Posts 7

Collections 28

models 518 Sort: Recently updated

datasets 0

models 518