Mitko Vasilev's picture

Mitko Vasilev

mitkox

·

AI & ML interests

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

Recent Activity

posted an update about 23 hours ago

I just pushed Claude Code Agent Swarm with 20 coding agents on my desktop GPU workstation. With local AI, I don’t have /fast CC switch, but I have /absurdlyfast: - 100’499 tokens/second read, yeah 100k, not a typo | 811 tok/sec generation - KV cache: 707’200 tokens - Hardware: 5+ year old GPUs 4xA6K gen1; It’s not the car. It’s the driver. Qwen3 Coder Next AWQ with cache at BF16. Scores 82.1% in C# on 29-years-in-dev codebase vs Opus 4.5 at only 57.5%. When your codebase predates Stack Overflow, you don't need the biggest model; you need the one that actually remembers Windows 95. My current bottleneck is my 27" monitor. Can't fit all 20 Theos on screen without squinting.

posted an update 11 days ago

▐▛██▜▌ Claude Code v2.1.23 ▝████▘ Kimi-K2.5 · API Usage Billing ▘▘ ▝▝ ~/dev/vllm /model to try Opus 4.5 ❯ hey ● Hello! How can I help you today? ❯ what model are you? ● I'm Claude Kimi-K2.5, running in a local environment on Linux. Took some time to download and vLLM hybrid inferencing magic to get it running on my desktop workstation.

posted an update 19 days ago

GLM-4.7-Flash is fast, good and cheap. 3,074 tokens/sec peak at 200k tokens context window on my desktop PC. Works with Claude Code and opencode for hours. No errors, drop-in replacement of the Anthropic cloud AI. MIT licensed, open weights, free for commercial use and modifications. Supports speculative decoding using MTP, which is highly effective in mitigating latency. Great for on device AI coding as AWQ 4bit at 18.5 GB. Hybrid inference on a single consumer GPU + CPU RAM.

View all activity

Organizations

New activity in open-acc/README about 1 year ago

Bye Apple and hi NVIDIA

#6 opened about 1 year ago by

New activity in mitkox/WhiteRabbitNeo-2.5-Qwen-2.5-Coder-7B-mlx over 1 year ago

Upload folder using huggingface_hub

#1 opened over 1 year ago by

New activity in microsoft/kosmos-2.5 over 1 year ago

Apply for community grant: Academic project

#1 opened over 1 year ago by

New activity in google/gemma-7b almost 2 years ago

How long does this approval process take?

#10 opened almost 2 years ago by

New activity in TheBloke/WhiteRabbitNeo-33B-v1-GGUF about 2 years ago

Not able to run this model?

#1 opened about 2 years ago by