soates (Stephen Oates)

upvoted 2 articles 5 months ago

Article

Train AI models with Unsloth and Hugging Face Jobs for FREE

+4

burtenshaw, danielhanchen, shimmyshimmer, mlabonne, davanstrien, evalstate

•

Feb 20

• 103

Article

We Got Claude to Build CUDA Kernels and teach open models!

+2

burtenshaw, evalstate, merve, pcuenq

•

Jan 28

• 158

upvoted 2 articles 7 months ago

Article

Deriving the PPO Loss from First Principles

garg-aayush

•

Dec 25, 2025

• 46

Article

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

sionic-ai

•

Dec 8, 2025

• 60

upvoted a collection 7 months ago

Physics of Language Models: Part 4.2

Collection

16 items • Updated Jul 29, 2025 • 20

upvoted an article 7 months ago

Article

We Got Claude to Fine-Tune an Open Source LLM

burtenshaw, evalstate

•

Dec 4, 2025

• 630

upvoted a paper 9 months ago

The Massive Legal Embedding Benchmark (MLEB)

Paper • 2510.19365 • Published Oct 22, 2025 • 18

upvoted an article 9 months ago

Article

Australian-made LLM beats OpenAI and Google at legal retrieval

isaacus

•

Oct 23, 2025

• 28

upvoted an article 10 months ago

Article

There is no such thing as a tokenizer-free lunch

catherinearnett

•

Sep 25, 2025

• 101

upvoted 2 papers 10 months ago

Virtual Agent Economies

Paper • 2509.10147 • Published Sep 12, 2025 • 27

The Majority is not always right: RL training for solution aggregation

Paper • 2509.06870 • Published Sep 8, 2025 • 15

upvoted 2 papers about 1 year ago

Large Language Models are Locally Linear Mappings

Paper • 2505.24293 • Published May 30, 2025 • 14

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Paper • 2505.11711 • Published May 16, 2025 • 10

upvoted 2 articles about 1 year ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

+5

ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb

•

May 21, 2025

• 262

Article

Tiny Agents: an MCP-powered agent in 50 lines of code

julien-c

•

Apr 25, 2025

• 308

upvoted a paper about 1 year ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18, 2025 • 141

upvoted an article about 1 year ago

Article

Gotchas in Tokenizer Behavior Every Developer Should Know

qgallouedec

•

Apr 18, 2025

• 72

upvoted a collection over 1 year ago

Gemma 3

Collection

All versions of Google's new multimodal models including QAT in 1B, 4B, 12B, and 27B sizes. In GGUF, dynamic 4-bit and 16-bit formats. • 54 items • Updated about 9 hours ago • 116

upvoted 2 articles over 1 year ago

Article

Open-R1: Update #1

open-r1

•

Feb 2, 2025

• 304

Article

Open-R1: a fully open reproduction of DeepSeek-R1

+1

eliebak, lvwerra, lewtun

•

Jan 28, 2025

• 890

Stephen Oates PRO

AI & ML interests

Organizations

Train AI models with Unsloth and Hugging Face Jobs for FREE

We Got Claude to Build CUDA Kernels and teach open models!

Deriving the PPO Loss from First Principles

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

Physics of Language Models: Part 4.2

We Got Claude to Fine-Tune an Open Source LLM

The Massive Legal Embedding Benchmark (MLEB)

Australian-made LLM beats OpenAI and Google at legal retrieval

There is no such thing as a tokenizer-free lunch

Virtual Agent Economies

The Majority is not always right: RL training for solution aggregation

Large Language Models are Locally Linear Mappings

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Tiny Agents: an MCP-powered agent in 50 lines of code

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Gotchas in Tokenizer Behavior Every Developer Should Know

Gemma 3

Open-R1: Update #1

Open-R1: a fully open reproduction of DeepSeek-R1

Stephen Oates PRO

AI & ML interests

Organizations

soates's activity

Train AI models with Unsloth and Hugging Face Jobs for FREE

We Got Claude to Build CUDA Kernels and teach open models!

Deriving the PPO Loss from First Principles

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

We Got Claude to Fine-Tune an Open Source LLM

Australian-made LLM beats OpenAI and Google at legal retrieval

There is no such thing as a tokenizer-free lunch

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Tiny Agents: an MCP-powered agent in 50 lines of code

Gotchas in Tokenizer Behavior Every Developer Should Know

Open-R1: Update #1

Open-R1: a fully open reproduction of DeepSeek-R1