Man Cub
mancub
·
AI & ML interests
None yet
Recent Activity
new activity about 10 hours ago
z-lab/Qwen3.6-27B-DFlash:Are we going to see an update to this model? new activity 3 days ago
Intel/gemma-4-31B-it-int4-AutoRound:INT8 version for TP=2 / dual Ampere GPUs?Organizations
None yet
Are we going to see an update to this model?
#15 opened about 10 hours ago
by
mancub
INT8 version for TP=2 / dual Ampere GPUs?
🚀 1
1
#6 opened 22 days ago
by
mancub
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half
3
#10 opened 21 days ago
by
mancub
final v16 does not appear to work correctly, it stops after the first prompt.
🔥 2
2
#19 opened 14 days ago
by
mancub
v13 stops dead after the first response
👍 2
5
#14 opened 16 days ago
by
mancub
Crashes with newest vllm version (v0.20.1)
15
#1 opened 24 days ago
by
Neiko2002
v11/v12 performance considerations with Claude Code?
3
#11 opened 18 days ago
by
mancub
When using Claude Code, tool calls end up broken with this chat template in Qwen3.6-27B
7
#6 opened 19 days ago
by
mancub
Good quant!
12
#1 opened 26 days ago
by
qenme
Does not appear to work with the new google drafter MTP model
#2 opened 22 days ago
by
mancub
Is it supposed to work in vllm?
1
#2 opened 22 days ago
by
mancub
Avg Draft acceptance rate is low.
17
#2 opened about 1 month ago
by
fouvy
OOM and context limits reached too soon
1
#5 opened about 1 month ago
by
mancub
Unable to run on 3090
1
#1 opened about 1 month ago
by
mancub
How to split this model between 2 (3) GPUs and CPU/RAM ?
30
#12 opened 2 months ago
by
mancub
My personal vLLM launch cmd on my old personal 2x3090 workstation
7
#1 opened 3 months ago
by
tclf90
What was just updated and why?
👍 1
2
#1 opened about 2 months ago
by
mancub
How to use it with llama-server ?
👀 1
3
#1 opened 2 months ago
by
mancub
Poor performance and pretty lobotomized
2
#1 opened 2 months ago
by
mancub