The tokenizer you are loading from 'cyankiwi/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit' with an incorrect regex pattern
π 1
1
#14 opened 3 months ago
by
mxdlzg
Perfomance of this model is one of the best
π€π 2
2
#13 opened 4 months ago
by
Geximus
Performance evaluation for v1.0.0 Model
3
#12 opened 4 months ago
by
woodytse
Bloody hell!! running perfectly on 3x 3090 at 160k context, speeds between 65tk/s to 30tk/s (depending on lenght) , my script:
π€ 3
#11 opened 6 months ago
by
groxaxo
Did anyone get speculative decode working?
π 1
4
#10 opened 6 months ago
by
amit864
Successfully Running Qwen3-Next-80B-A3B-Instruct-AWQ-4bit on 3x RTX 3090s
β€οΈπ€ 4
7
#9 opened 6 months ago
by
8055izham
sorta works on vllm now
π 1
15
#8 opened 7 months ago
by
MrDragonFox
Recent update throws error: KeyError: 'layers.30.mlp.shared_expert.down_proj.weight'
3
#7 opened 7 months ago
by
itsmebcc
gibberish still persists?
5
#6 opened 7 months ago
by
Geximus
MTP Accepted throughput always at 0.00 tokens/s
4
#5 opened 7 months ago
by
bpozdena
Experiencing excessive response latency.
π 4
#4 opened 7 months ago
by
JunHowie
Does this quantized version support running on machines like V100 and V100S?
β 1
#3 opened 7 months ago
by
ShaoShuoHe
Error on inputting lots of prompts
#2 opened 7 months ago
by
dwaynedu
Error when running in VLLM
π 2
21
#1 opened 7 months ago
by
d8rt8v