Any tips how to make this work in ollama?
I am trying to run this model with ollama, but it looks like it does not like the separated mmproj file, so I am wondering if there is some way to use the original (censored) gguf, replace just the llm related entries with the uncensored ones but leave the old vision encoder in there? So I guess my question is if the vision encoder in this repo is identical to the original one or if there are also changes made?
Or the other question is: is there a simple way to make this model also work in ollama?
Ollama only supports text mode for qwen35/qwe35moe for now (until a PR that will update llama.cpp backend is merged). So for now, you need to create a Modelfile without mmproj and create the model, more or less like what is shown here (another qwen35 model finetune).
Edit: the remark above is for third party GGUFs like this one.
Hmm, maybe I am misunderstanding you there, but I could already use the original qwen3.5-9B model for image analysis, OCR or finding bounding boxes in the latest ollama without problems.
Actually, in the meantime I managed to also make this uncensored model work with image input in ollama by using a vibe-coded gguf transplantation script, where I take the original qwen35 9B gguf which ollama had cached and replace all the llm weights with the weights from this gguf but leave the vision encoder untouched. And it seems to work fine.
Sorry, I wasn't clear enough: what I meant is that third party GGUFs like this one are not fully supported yet, their own quantized model (from their repo, what you probably call "original") works. I would go with the custom Modelfile, where I can have full control of the parameters, just my two cents. Regards
The ollama server log shows this has architecture of qwen35, not qwen3.5 and errors at that point such that a comfyui node receives an error 500. Specifically
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
It does that whether I try to load just the model (q8 or bf16) or the ollama create model with mmproj addition... even though the create itself seems successful.
For me this works fine (including vision-related tasks):
ollama pull qwen3.5:9b-q4_K_M
ollama serve
then for example using a python script:
import base64, json, urllib.request
img = base64.b64encode(open("/path/to/image.jpg", "rb").read()).decode()
req = urllib.request.Request("http://localhost:11434/api/chat",
data=json.dumps({"model": "qwen3.5:9b-q4_K_M", "stream": False, "think": False,
"messages": [{"role": "user", "content": "Describe this image.", "images": [img]}]
}).encode(), headers={"Content-Type": "application/json"})
print(json.load(urllib.request.urlopen(req, timeout=120))["message"]["content"])
For using the uncensored model with ollama I vibe-coded this approach (as well as this description of the process):
GGUF vision encoder transplant — what the script does
The goal is to create an uncensored vision-capable model by combining two GGUFs:
- Source A — the official censored qwen3.5:9b-q4_K_M from Ollama, which has a working vision encoder
- Source B — Qwen3.5-9B-Uncensored-HauhauCS, a community fine-tune for uncensored text generation, which has no
vision encoder
The script produces a merged GGUF that has the uncensored language model weights but the original vision encoder
— giving you an uncensored model that can still analyse images.
Step 1 — Read both GGUFs
Both files are opened with GGUFReader. This gives access to all tensors and all key-value metadata fields.
Step 2 — Build tensor mapping
All tensors are split into two groups:
- Vision tensors (~456) — taken from the censored model as-is
- Language model tensors (~427) — taken from the uncensored model
One complication: the uncensored model names its SSM bias tensors *.ssm_dt.bias while the censored model calls
them *.ssm_dt. Since the shapes are identical, the script renames them on the fly during the mapping step.
Step 3 — Copy all metadata as raw bytes
This is the trickiest part. GGUFWriter's normal API for writing key-value fields is broken for any field that
contains multiple values (arrays) — it only captures the last element. Rather than fighting that, the script
bypasses the writer entirely for metadata and copies all KV fields from the censored model as raw bytes
(b''.join(bytes(p) for p in field.parts)), which gives a byte-perfect copy regardless of field type or
complexity. This preserves things like mrope_sections, image_mean, image_std, tokenizer vocabulary, etc.
Step 4 — Write the output file
- GGUFWriter handles the file header and all tensor data
- general.architecture is written via the normal API (required by the writer)
- The remaining 51 KV fields are appended as raw bytes
- The KV count in the file header is patched in-place to reflect the correct total
Result
The output GGUF is loaded into Ollama as qwen35-uncensored:latest and works for both text and image analysis,
with the uncensored response behaviour of the HauhauCS fine-tune.
ollama create qwen35-uncensored -f [path to merged model file]
I was able to run this model on Ollama. so
first I tried with
ollama run hf.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive:Q8_0
but it didn't work. so I manually downloaded the Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf file from files.
once downloaded a make a folder, and put the .gguf file in there. and create a new file called "Modelfile" and inside you have to have something like this:
FROM ./Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|end_of_text|>"
PARAMETER stop "<|user|>"
TEMPLATE """<|user|>
{{ .Prompt }}<|assistant|>
"""
SYSTEM """You are a helpful AI assistant."""
once that's done, open cmd on that folder and run
ollama create your_model_name -f Modelfile
replace your_model_namewith the name you desire. eg: qwen_unc
once that's done, you should be able to run
ollama run qwen_unc
I was able to run this model on Ollama. so
first I tried with
ollama run hf.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive:Q8_0but it didn't work. so I manually downloaded the
Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguffile from files.
once downloaded a make a folder, and put the.gguffile in there. and create a new file called "Modelfile" and inside you have to have something like this:FROM ./Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf PARAMETER temperature 0.7 PARAMETER top_p 0.9 PARAMETER stop "<|end_of_text|>" PARAMETER stop "<|user|>" TEMPLATE """<|user|> {{ .Prompt }}<|assistant|> """ SYSTEM """You are a helpful AI assistant."""once that's done, open
cmdon that folder and runollama create your_model_name -f Modelfilereplace
your_model_namewith the name you desire. eg:qwen_unconce that's done, you should be able to run
ollama run qwen_unc
Unable to process images
If you want to process images you have to use my approach with the gguf surgery. Works fine for me.
It all sounds very logical, but is there any possibility that you could share the script? I've been trying to create it unsuccessfully for quite some time.
Sure - here it is - of course all on your own risk:
https://gist.github.com/Quasimondo/103b7e71e552d077ba2c770e14906417
(I put it in a gist since the forum text formatter butchered it)
Sure - here it is - of course all on your own risk:
https://gist.github.com/Quasimondo/103b7e71e552d077ba2c770e14906417
(I put it in a gist since the forum text formatter butchered it)
Are you saying that script takes a model as input and uncensores it and saves into another folder without any other input?
Are you saying that script takes a model as input and uncensores it and saves into another folder without any other input?
No - this script just converts this uncensored model to a gguf that can be loaded and used by ollama, including vision tasks. The model shared here is not compatible with ollama since it separates the vision encoder into a separate gguf. All that my script does is to take the censored qwen3.5 gguf and replaces the weights with the uncensored one from this repo whilst keeping the vision encoder.
I was able to run this model on Ollama. so
first I tried with
ollama run hf.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive:Q8_0but it didn't work. so I manually downloaded the
Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguffile from files.
once downloaded a make a folder, and put the.gguffile in there. and create a new file called "Modelfile" and inside you have to have something like this:FROM ./Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf PARAMETER temperature 0.7 PARAMETER top_p 0.9 PARAMETER stop "<|end_of_text|>" PARAMETER stop "<|user|>" TEMPLATE """<|user|> {{ .Prompt }}<|assistant|> """ SYSTEM """You are a helpful AI assistant."""once that's done, open
cmdon that folder and runollama create your_model_name -f Modelfilereplace
your_model_namewith the name you desire. eg:qwen_unconce that's done, you should be able to run
ollama run qwen_uncUnable to process images
I put gguf and mmproj in the same directory and use them in the directory From ./ ;but Error 500 Internal Server Error: unable to load model:
Have you uploaded it publicly to ollama? Would be very kind because I don't want to do it manually.