The tokenizer you are loading from 'cyankiwi/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit' with an incorrect regex pattern

#14

by mxdlzg - opened Jan 18

Jan 18

The tokenizer you are loading from '/var/lib/gpustack/cache/model_scope/cyankiwi/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the fix_mistral_regex=True flag when loading this tokenizer to fix this issue.
[0;36m(APIServer pid=27)[0;0m INFO 01-18 20:37:42 [chat_utils.py:590] Detected the chat template content format to be 'string'. You can set --chat-template-content-format to override this.
[0;36m(APIServer pid=27)

Is it a problem with the transformers version?
I am using v4, but I see the config is v5

cpatonn

cyankiwi org Feb 2

Thank you for raising this with me. It seems that this happens when models are loaded from model directory /var/lib/gpustack/cache/model_scope/cyankiwi/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit and not from huggingface repo id i.e., cyankiwi/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit.

In my environment, it also happens with other models including the original BF16 models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment