Stephen Oates PRO

soates

AI & ML interests

None yet

Recent Activity

upvoted an article 15 days ago

Deriving the PPO Loss from First Principles

upvoted an article about 1 month ago

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

upvoted a collection about 1 month ago

Physics of Language Models: Part 4.2

View all activity

Organizations

None yet

upvoted an article 15 days ago

Article

Deriving the PPO Loss from First Principles

16 days ago

•

upvoted an article about 1 month ago

Article

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

Dec 8, 2025

•

upvoted a collection about 1 month ago

Physics of Language Models: Part 4.2

Collection

16 items • Updated Jul 29, 2025 • 15

upvoted an article about 1 month ago

Article

We Got Claude to Fine-Tune an Open Source LLM

Dec 4, 2025

•

570

upvoted a paper 3 months ago

The Massive Legal Embedding Benchmark (MLEB)

Paper • 2510.19365 • Published Oct 22, 2025 • 17

upvoted an article 3 months ago

Article

Australian-made LLM beats OpenAI and Google at legal retrieval

Oct 23, 2025

•

upvoted an article 4 months ago

Article

There is no such thing as a tokenizer-free lunch

Sep 25, 2025

•

updated a dataset 4 months ago

soates/australian-insurance-dspy-corpus

Viewer • Updated Sep 17, 2025 • 359 • 19

published a dataset 4 months ago

soates/australian-insurance-dspy-corpus

Viewer • Updated Sep 17, 2025 • 359 • 19

upvoted 2 papers 4 months ago

Virtual Agent Economies

Paper • 2509.10147 • Published Sep 12, 2025 • 26

The Majority is not always right: RL training for solution aggregation

Paper • 2509.06870 • Published Sep 8, 2025 • 16

updated a dataset 5 months ago

soates/tictactoe-gemma-dataset

Viewer • Updated Aug 15, 2025 • 93.6k • 9

published a dataset 5 months ago

soates/tictactoe-gemma-dataset

Viewer • Updated Aug 15, 2025 • 93.6k • 9

liked a model 6 months ago

Menlo/Lucy-128k

Text Generation • 2B • Updated Aug 4, 2025 • 175 • 108

liked a model 7 months ago

chandar-lab/NeoBERT

Feature Extraction • 0.2B • Updated Mar 25, 2025 • 2.77k • 186

upvoted a paper 7 months ago

Large Language Models are Locally Linear Mappings

Paper • 2505.24293 • Published May 30, 2025 • 14

upvoted a paper 8 months ago

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Paper • 2505.11711 • Published May 16, 2025 • 11

upvoted an article 8 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

May 21, 2025

•

247

upvoted an article 9 months ago

Article

Tiny Agents: an MCP-powered agent in 50 lines of code

Apr 25, 2025

•

305

upvoted a paper 9 months ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18, 2025 • 139

Stephen Oates PRO

AI & ML interests

Recent Activity

Organizations

soates's activity

Deriving the PPO Loss from First Principles

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

We Got Claude to Fine-Tune an Open Source LLM

Australian-made LLM beats OpenAI and Google at legal retrieval

There is no such thing as a tokenizer-free lunch

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Tiny Agents: an MCP-powered agent in 50 lines of code