21 34 61

Tony Wu

tonywu71

AI & ML interests

LLM, Multimodal, Agents, Information Retrieval, RAG, Speech

Recent Activity

liked a Space about 1 month ago

OpenEvals/evaluation-guidebook

upvoted an article about 1 month ago

Transformers v5: Simple model definitions powering the AI ecosystem

upvoted an article about 1 month ago

Continuous batching from first principles

View all activity

Organizations

upvoted 2 articles about 1 month ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

Dec 1, 2025

•

265

Article

Continuous batching from first principles

Nov 25, 2025

•

297

upvoted a paper 2 months ago

Surfer 2: The Next Generation of Cross-Platform Computer Use Agents

Paper • 2510.19949 • Published Oct 22, 2025 • 38

upvoted a collection 4 months ago

Holo1.5

Collection

Holo1.5 - Open Foundation Models for Computer Use Agents • 5 items • Updated Sep 15, 2025 • 34

upvoted 3 articles 6 months ago

Article

You could have designed state of the art positional encoding

Nov 25, 2024

•

429

Article

Merge Large Language Models with mergekit

Jan 9, 2024

•

147

Article

SmolLM3: smol, multilingual, long-context reasoner

Jul 8, 2025

•

743

upvoted a collection 7 months ago

Holo1

Collection

Vision-Language Action Model for use in Surfer-H web navigation agent • 6 items • Updated Jun 10, 2025 • 48

upvoted 3 articles 8 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

May 21, 2025

•

247

Article

Preference Optimization for Vision Language Models

Jul 10, 2024

•

Article

Vision Language Models (Better, faster, stronger)

May 12, 2025

•

580

upvoted an article 9 months ago

Article

Gotchas in Tokenizer Behavior Every Developer Should Know

Apr 18, 2025

•

upvoted 2 papers 9 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 306

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 202

upvoted an article 9 months ago

Article

ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval

Mar 18, 2025

•

upvoted an article 11 months ago

Article

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

Jul 29, 2024

•

365

upvoted a paper 11 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20, 2025 • 157

upvoted 2 articles 11 months ago

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21, 2025

•

193

Article

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

Feb 19, 2025

•

upvoted a paper 11 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 253

Tony Wu

AI & ML interests

Recent Activity

Organizations

tonywu71's activity

Transformers v5: Simple model definitions powering the AI ecosystem

Continuous batching from first principles

You could have designed state of the art positional encoding

Merge Large Language Models with mergekit

SmolLM3: smol, multilingual, long-context reasoner

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Preference Optimization for Vision Language Models

Vision Language Models (Better, faster, stronger)

Gotchas in Tokenizer Behavior Every Developer Should Know

ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

SigLIP 2: A better multilingual vision language encoder

PaliGemma 2 Mix - New Instruction Vision Language Models by Google