Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

IlyasMoutawwakil

posted an update 2 days ago

Post

2549

Transformers v5 just landed! 🚀
It significantly unifies and reduces modeling code across architectures, while opening the door to a whole new class of performance optimizations.

My favorite new feature? 🤔
The new dynamic weight loader + converter. Here’s why 👇

Over the last few months, the core Transformers maintainers built an incredibly fast weight loader, capable of converting tensors on the fly while loading them in parallel threads. This means we’re no longer constrained by how parameters are laid out inside the safetensors weight files.

In practice, this unlocks two big things:
- Much more modular modeling code. You can now clearly see how architectures build on top of each other (DeepSeek v2 → v3, Qwen v2 → v3 → MoE, etc.). This makes shared bottlenecks obvious and lets us optimize the right building blocks once, for all model families.
- Performance optimizations beyond what torch.compile can do alone. torch.compile operates on the computation graph, but it can’t change parameter layouts. With the new loader, we can restructure weights at load time: fusing MoE expert projections, merging attention QKV projections, and enabling more compute-dense kernels that simply weren’t possible before.

Personally, I'm honored to have contributed in this direction, including the work on optimizing MoE implementations and making modeling code more torch-exportable, so these optimizations can be ported cleanly across runtimes.

Overall, Transformers v5 is a strong signal of where the community and industry are converging: Modularity and Performance, without sacrificing Flexibility.

Transformers v5 makes its signature from_pretrained an entrypoint where you can mix and match:
- Parallelism
- Quantization
- Custom kernels
- Flash/Paged attention
- Continuous batching
- ...

Kudos to everyone involved! I highly recommend the:
Release notes: https://github.com/huggingface/transformers/releases/tag/v5.0.0
Blog post: https://huggingface.co/blog/transformers-v5

3 replies

danielhanchen

posted an update about 21 hours ago

Post

1236

You can now run Kimi K2.5 locally! 🔥

We shrank the 1T model to 240GB (-60%) via Dynamic 1-bit.
Get >40 tok/s on 242GB or 622GB VRAM/RAM for near full precision.

GGUF: unsloth/Kimi-K2.5-GGUF

Guide: https://unsloth.ai/docs/models/kimi-k2.5

AdinaY

posted an update 2 days ago

Post

999

Big day in open source AI!!

✨ DeepSeek released OCR2 💥
deepseek-ai/DeepSeek-OCR-2

✨ Kimi K2.5 just landed 🔥
moonshotai/Kimi-K2.5

With the Chinese Spring Festival 3 weeks away,

what’s coming next?👀

RakshitAralimatti

posted an update about 17 hours ago

Post

828

Just built my entire AI Engineer portfolio by pasting 2 links (GitHub and LinkedIn) into

moonshotai Kimi 2.5.
That's it. That's the workflow.
Zero coding. Zero iteration. Zero "make the button bigger."
See for yourself: https://rakshit2020.github.io/rakshitaralimatti.github.io/

The model:
✅ Scraped my GitHub repos automatically
✅ Pulled my experience from LinkedIn
✅ Designed an Aurora Glass theme
✅ Mapped every skill to projects
✅ Added animations I'd never code myself

1 reply

ovi054

posted an update 3 days ago

Post

2722

ovi054/LTX-2-19b-Squish-LoRA ⚡

I trained a Squish LoRA for LTX-2. Upload an image and give prompt "squish it" to get the squish video.

Demo output videos are attached.

👉Try it now:
ovi054/LTX-2-19b-Squish-LoRA
ovi054/ltx-2-Audio-to-Video

alvarobartt

posted an update about 18 hours ago

Post

908

💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.

1 reply

sergiopaniego

posted an update about 20 hours ago

Post

858

New TRL + OpenEnv example! 💥

Fine tune an LLM for playing Sudoku using an RL env via OpenEnv

Includes a script that runs on 1 or multiple GPUs with vLLM, plus a Colab-ready notebook.

Enjoy!

Notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynb

Script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/sudoku.py

1 reply

danieldk

posted an update 1 day ago

Post

2062

kernels 0.12 is out! 🎉

Changes:

* Support for kernel version branches to gracefully roll out kernel API changes.
* Support for PyTorch 2.10.
* kernel-builder is now merged into the kernels repo.
* Initial support for standardized kernel benchmarks.

https://github.com/huggingface/kernels/releases/tag/v0.12.0

imnotkitty

posted an update 2 days ago

Post

1693

📌Same day, Two Releases.

Jan 27th just got interesting on Open-source AI modles.
✅Kimi K2.5: How to make models "think" across text and vision natively?
moonshotai/Kimi-K2.5
✅DeepSeek-OCR 2: How to make models "see" more like humans, not scanners?
deepseek-ai/DeepSeek-OCR-2

One focuses on depth of reasoning, the other on precision of vision.
What's the key differentiator for a multimodal model in your view: raw power or computational elegance?

kanaria007

posted an update 2 days ago

Post

1648

✅ New Article: *Post-Transformer Decision Cores* (v0.1)

Title:
🚀 Post-Transformer Decision Cores: Goal-Native Engines Beyond LLMs
🔗 https://huggingface.co/blog/kanaria007/post-tranformer-decision-cores

---

Summary:
Transformers are powerful—but in SI-Core they’re *not the essence of intelligence*. A *Decision Core* is anything that satisfies the *Jump contracts* (OBS/ETH/MEM/ID/EVAL + RML), and those contracts don’t require next-token prediction.

This article sketches what “post-Transformer” looks like in practice: *goal-native, structure-aware controllers* that may use LLMs as tools—but don’t depend on them as the runtime brain.

> Don’t relax the contracts.
> Replace the engine behind them.

---

Why It Matters:
• Makes LLMs *optional*: shift them to “genesis / exploration / explanation,” while routine high-stakes Jumps run on structured cores
• Improves boring-but-critical properties: *determinism (CAS), fewer inconsistencies (SCI), fewer ETH violations (EAI), better rollback (RBL/RIR)*
• Enables gradual adoption via *pluggable Jump engines* and domain-by-domain “primary vs fallback” switching

---

What’s Inside:
• The architectural inversion: *World → OBS → SIM/SIS → Jump (Decision Core) → RML → Effects* (LLM is just one engine)
• Three compatible post-Transformer directions:

1. *World-model + search controllers* (MPC/MCTS/anytime search with explicit GCS + ETH constraints)
2. *Genius-distilled specialized controllers* (distill structure from GeniusTraces; LLM becomes a “genesis tool”)
3. *SIL-compiled Decision Programs* (typed Jump entrypoints, compiler-checked invariants, DPIR/GSPU targeting)
• A realistic migration path: LLM-wrapped → Genius library → shadow dual-run → flip primary by domain → SIL-compiled cores
• How this connects to “reproducing genius”: GRP provides trace selection/format; this article provides the engine architectures

---

📖 Structured Intelligence Engineering Series

Recently active users