geronimo

g-ronimo

https://medium.com/@geronimo7

geronimi73

AI & ML interests

fafo

Recent Activity

upvoted an article 6 days ago

PRX Part 3 — Training a Text-to-Image Model in 24h!

liked a model 7 days ago

onnx-community/Qwen3.5-0.8B-ONNX

liked a model 13 days ago

Qwen/Qwen3.5-27B

View all activity

Organizations

upvoted an article 6 days ago

Article

PRX Part 3 — Training a Text-to-Image Model in 24h!

6 days ago

•

upvoted an article 17 days ago

Article

Training Design for Text-to-Image Models: Lessons from Ablations

Feb 3

•

upvoted an article 23 days ago

Article

Custom Kernels for All from Codex and Claude

25 days ago

•

upvoted an article about 2 months ago

Article

Swift Transformers Reaches 1.0 – and Looks to the Future

Sep 26, 2025

•

upvoted an article 2 months ago

Article

Small Yet Mighty: Improve Accuracy In Multimodal Search and Visual Document Retrieval with Llama Nemotron RAG Models

Jan 6

•

upvoted 4 articles 3 months ago

Article

Continuous batching from first principles

Nov 25, 2025

•

342

Article

Text-to-image Architectural Experiments

Nov 13, 2025

•

Article

We’re open-sourcing our text-to-image model and the process behind it

Nov 12, 2025

•

Article

Diffusers welcomes FLUX-2

Nov 25, 2025

•

180

upvoted a collection 5 months ago

Granite Embedding Models

Collection

7 items • Updated Nov 17, 2025 • 32

upvoted a paper 7 months ago

MetaCLIP 2: A Worldwide Scaling Recipe

Paper • 2507.22062 • Published Jul 29, 2025 • 37

upvoted 2 articles 8 months ago

Article

Extending Transformer layers as Painters to DiT's

Aug 31, 2024

•

Article

LeRobot.js

Jul 14, 2025

•

upvoted 2 articles 9 months ago

Article

Learn the Hugging Face Kernel Hub in 5 Minutes

Jun 12, 2025

•

161

Article

KV Cache from scratch in nanoVLM

Jun 4, 2025

•

113

upvoted an article 10 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

May 21, 2025

•

252

upvoted a paper 10 months ago

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Paper • 2505.04601 • Published May 7, 2025 • 29

upvoted a collection 11 months ago

Vision

Collection

163 items • Updated Jan 7 • 1

upvoted an article 11 months ago

Article

Remote VAEs for decoding with Inference Endpoints 🤗

Feb 24, 2025

•

upvoted a paper 11 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 205

geronimo

AI & ML interests

Recent Activity

Organizations

g-ronimo's activity

PRX Part 3 — Training a Text-to-Image Model in 24h!

Training Design for Text-to-Image Models: Lessons from Ablations

Custom Kernels for All from Codex and Claude

Swift Transformers Reaches 1.0 – and Looks to the Future

Small Yet Mighty: Improve Accuracy In Multimodal Search and Visual Document Retrieval with Llama Nemotron RAG Models

Continuous batching from first principles

Text-to-image Architectural Experiments

We’re open-sourcing our text-to-image model and the process behind it

Diffusers welcomes FLUX-2

Extending *Transformer layers as Painters* to DiT's

LeRobot.js

Learn the Hugging Face Kernel Hub in 5 Minutes

KV Cache from scratch in nanoVLM

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Remote VAEs for decoding with Inference Endpoints 🤗

Extending Transformer layers as Painters to DiT's