Any tips how to make this work in ollama?

#12
by Quasimondo - opened

I am trying to run this model with ollama, but it looks like it does not like the separated mmproj file, so I am wondering if there is some way to use the original (censored) gguf, replace just the llm related entries with the uncensored ones but leave the old vision encoder in there? So I guess my question is if the vision encoder in this repo is identical to the original one or if there are also changes made?

Or the other question is: is there a simple way to make this model also work in ollama?

Ollama only supports text mode for qwen35/qwe35moe for now (until a PR that will update llama.cpp backend is merged). So for now, you need to create a Modelfile without mmproj and create the model, more or less like what is shown here (another qwen35 model finetune).

Edit: the remark above is for third party GGUFs like this one.

Hmm, maybe I am misunderstanding you there, but I could already use the original qwen3.5-9B model for image analysis, OCR or finding bounding boxes in the latest ollama without problems.

Actually, in the meantime I managed to also make this uncensored model work with image input in ollama by using a vibe-coded gguf transplantation script, where I take the original qwen35 9B gguf which ollama had cached and replace all the llm weights with the weights from this gguf but leave the vision encoder untouched. And it seems to work fine.

Sorry, I wasn't clear enough: what I meant is that third party GGUFs like this one are not fully supported yet, their own quantized model (from their repo, what you probably call "original") works. I would go with the custom Modelfile, where I can have full control of the parameters, just my two cents. Regards

The ollama server log shows this has architecture of qwen35, not qwen3.5 and errors at that point such that a comfyui node receives an error 500. Specifically
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'

It does that whether I try to load just the model (q8 or bf16) or the ollama create model with mmproj addition... even though the create itself seems successful.

For me this works fine (including vision-related tasks):
ollama pull qwen3.5:9b-q4_K_M
ollama serve

then for example using a python script:

import base64, json, urllib.request
img = base64.b64encode(open("/path/to/image.jpg", "rb").read()).decode()
req = urllib.request.Request("http://localhost:11434/api/chat",
data=json.dumps({"model": "qwen3.5:9b-q4_K_M", "stream": False, "think": False,
"messages": [{"role": "user", "content": "Describe this image.", "images": [img]}]
}).encode(), headers={"Content-Type": "application/json"})
print(json.load(urllib.request.urlopen(req, timeout=120))["message"]["content"])


For using the uncensored model with ollama I vibe-coded this approach (as well as this description of the process):

GGUF vision encoder transplant — what the script does

The goal is to create an uncensored vision-capable model by combining two GGUFs:

  • Source A — the official censored qwen3.5:9b-q4_K_M from Ollama, which has a working vision encoder
  • Source B — Qwen3.5-9B-Uncensored-HauhauCS, a community fine-tune for uncensored text generation, which has no
    vision encoder

The script produces a merged GGUF that has the uncensored language model weights but the original vision encoder
— giving you an uncensored model that can still analyse images.

Step 1 — Read both GGUFs

Both files are opened with GGUFReader. This gives access to all tensors and all key-value metadata fields.

Step 2 — Build tensor mapping

All tensors are split into two groups:

  • Vision tensors (~456) — taken from the censored model as-is
  • Language model tensors (~427) — taken from the uncensored model

One complication: the uncensored model names its SSM bias tensors *.ssm_dt.bias while the censored model calls
them *.ssm_dt. Since the shapes are identical, the script renames them on the fly during the mapping step.

Step 3 — Copy all metadata as raw bytes

This is the trickiest part. GGUFWriter's normal API for writing key-value fields is broken for any field that
contains multiple values (arrays) — it only captures the last element. Rather than fighting that, the script
bypasses the writer entirely for metadata and copies all KV fields from the censored model as raw bytes
(b''.join(bytes(p) for p in field.parts)), which gives a byte-perfect copy regardless of field type or
complexity. This preserves things like mrope_sections, image_mean, image_std, tokenizer vocabulary, etc.

Step 4 — Write the output file

  • GGUFWriter handles the file header and all tensor data
  • general.architecture is written via the normal API (required by the writer)
  • The remaining 51 KV fields are appended as raw bytes
  • The KV count in the file header is patched in-place to reflect the correct total

Result

The output GGUF is loaded into Ollama as qwen35-uncensored:latest and works for both text and image analysis,
with the uncensored response behaviour of the HauhauCS fine-tune.
ollama create qwen35-uncensored -f [path to merged model file]

I was able to run this model on Ollama. so

first I tried with

ollama run hf.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive:Q8_0

but it didn't work. so I manually downloaded the Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf file from files.
once downloaded a make a folder, and put the .gguf file in there. and create a new file called "Modelfile" and inside you have to have something like this:

FROM ./Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|end_of_text|>"
PARAMETER stop "<|user|>"


TEMPLATE """<|user|>
{{ .Prompt }}<|assistant|>
"""

SYSTEM """You are a helpful AI assistant."""

once that's done, open cmd on that folder and run

ollama create your_model_name -f Modelfile

replace your_model_namewith the name you desire. eg: qwen_unc

once that's done, you should be able to run

ollama run qwen_unc

I was able to run this model on Ollama. so

first I tried with

ollama run hf.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive:Q8_0

but it didn't work. so I manually downloaded the Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf file from files.
once downloaded a make a folder, and put the .gguf file in there. and create a new file called "Modelfile" and inside you have to have something like this:

FROM ./Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|end_of_text|>"
PARAMETER stop "<|user|>"


TEMPLATE """<|user|>
{{ .Prompt }}<|assistant|>
"""

SYSTEM """You are a helpful AI assistant."""

once that's done, open cmd on that folder and run

ollama create your_model_name -f Modelfile

replace your_model_namewith the name you desire. eg: qwen_unc

once that's done, you should be able to run

ollama run qwen_unc

Unable to process images

If you want to process images you have to use my approach with the gguf surgery. Works fine for me.

It all sounds very logical, but is there any possibility that you could share the script? I've been trying to create it unsuccessfully for quite some time.

Sure - here it is - of course all on your own risk:

https://gist.github.com/Quasimondo/103b7e71e552d077ba2c770e14906417

(I put it in a gist since the forum text formatter butchered it)

Sure - here it is - of course all on your own risk:

https://gist.github.com/Quasimondo/103b7e71e552d077ba2c770e14906417

(I put it in a gist since the forum text formatter butchered it)

Are you saying that script takes a model as input and uncensores it and saves into another folder without any other input?

Are you saying that script takes a model as input and uncensores it and saves into another folder without any other input?

No - this script just converts this uncensored model to a gguf that can be loaded and used by ollama, including vision tasks. The model shared here is not compatible with ollama since it separates the vision encoder into a separate gguf. All that my script does is to take the censored qwen3.5 gguf and replaces the weights with the uncensored one from this repo whilst keeping the vision encoder.

I was able to run this model on Ollama. so

first I tried with

ollama run hf.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive:Q8_0

but it didn't work. so I manually downloaded the Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf file from files.
once downloaded a make a folder, and put the .gguf file in there. and create a new file called "Modelfile" and inside you have to have something like this:

FROM ./Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|end_of_text|>"
PARAMETER stop "<|user|>"


TEMPLATE """<|user|>
{{ .Prompt }}<|assistant|>
"""

SYSTEM """You are a helpful AI assistant."""

once that's done, open cmd on that folder and run

ollama create your_model_name -f Modelfile

replace your_model_namewith the name you desire. eg: qwen_unc

once that's done, you should be able to run

ollama run qwen_unc

Unable to process images

I put gguf and mmproj in the same directory and use them in the directory From ./ ;but Error 500 Internal Server Error: unable to load model:

Have you uploaded it publicly to ollama? Would be very kind because I don't want to do it manually.

Sign up or log in to comment