ZeroGPU Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

LXT submitted a paper 19 days ago

SAMTok: Representing Any Mask with Two Words

gagan3012 authored a paper 28 days ago

From RAG to Agentic RAG for Faithful Islamic Question Answering

gagan3012 authored a paper 28 days ago

Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics

View all activity

julien-c

submitted a paper to Daily Papers 11 days ago

Shaping capabilities with token-level data filtering

Paper • 2601.21571 • Published 12 days ago • 25

innovation64

authored a paper 12 days ago

BMAM: Brain-inspired Multi-Agent Memory Framework

Paper • 2601.20465 • Published 13 days ago • 4

innovation64

submitted a paper to Daily Papers 12 days ago

BMAM: Brain-inspired Multi-Agent Memory Framework

Paper • 2601.20465 • Published 13 days ago • 4

codelion

posted an update 18 days ago

Post

3082

Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models

I wrote a deep dive into how Magic AI's 100M token context window might work, starting from their HashHop benchmark and building up to MALM - a Memory-Augmented Language Model.

Key insight: treating each key as a single token enables perfect retrieval at unlimited context lengths.

The article covers:

- How HashHop works and why its perfect accuracy is suspicious
- Building a tokenized solver that achieves 100% accuracy
- Scaling to MALM for real code search tasks
- Why this approach could handle 100M+ tokens

Read the full article: https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop

Try the model: codelion/malm-165m

Code: https://github.com/codelion/hash-hop

1 reply

LXT

submitted a paper to Daily Papers 19 days ago

SAMTok: Representing Any Mask with Two Words

Paper • 2601.16093 • Published 19 days ago • 41

BK-Lee

authored a paper about 1 month ago

Masking Teacher and Reinforcing Student for Distilling Vision-Language Models

Paper • 2512.22238 • Published Dec 23, 2025 • 27

BK-Lee

submitted a paper to Daily Papers about 1 month ago

Masking Teacher and Reinforcing Student for Distilling Vision-Language Models

Paper • 2512.22238 • Published Dec 23, 2025 • 27

codelion

posted an update about 2 months ago

Post

6085

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models!

Key findings from our research on optimal architectures for small language models:

→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count
→ Best-in-class factuality: 47.5% on TruthfulQA
→ 10x training efficiency using WSD (Warmup-Stable-Decay) conversion
→ Canon layers add only 0.13% parameters but improve reasoning

We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.

Blog: https://huggingface.co/blog/codelion/optimal-model-architecture
Model: codelion/dhara-70m

1 reply

ShoufaChen

authored a paper about 2 months ago

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

Paper • 2512.21338 • Published Dec 24, 2025 • 22

codelion

posted an update about 2 months ago

Post

2405

Introducing PTS Visualizer - an interactive tool for exploring how language models reason!

Visualize pivotal tokens, thought anchors, and reasoning circuits. See which tokens and sentences significantly impact success probability, explore embedding clusters, and trace reasoning step-by-step.

Try it: codelion/pts-visualizer

Explore PTS datasets:
- Qwen3-0.6B: codelion/Qwen3-0.6B-pts
- DeepSeek-R1: codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts

Or upload your own JSONL files!

GitHub: https://github.com/codelion/pts