1 69 8

Sweker

Swekerr

AI & ML interests

None yet

Recent Activity

updated a model 2 days ago

Swekerr/toxy-smollm2-360m-sft-v1.5

published a model 2 days ago

Swekerr/toxy-smollm2-360m-sft-v1.5

updated a model 8 days ago

Swekerr/toxy-smollm2-360m-sft-v1.0

View all activity

Organizations

upvoted 2 articles about 2 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

and 22 others •

Jul 8

• 646

Article

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

and 1 other •

Jul 9

• 666

upvoted an article 2 months ago

Article

Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models

and 8 others •

Jul 4

• 9

upvoted an article 3 months ago

Article

🐯 Liger GRPO meets TRL

and 5 others •

May 25

• 49

upvoted a paper 3 months ago

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 101

upvoted 3 articles 4 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

and 6 others •

May 21

• 208

Article

The Transformers Library: standardizing model definitions

and 3 others •

May 15

• 117

Article

Vision Language Models Explained

and 1 other •

Apr 11, 2024

• 444

upvoted a paper 4 months ago

Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

Paper • 2504.20752 • Published Apr 29 • 93

upvoted 2 articles 4 months ago

Article

Train your first Decision Transformer

and 1 other •

Sep 8, 2022

• 14

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 211

upvoted a paper 5 months ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 126

upvoted 4 articles 5 months ago

Article

What is test-time compute and how to scale it?

and 1 other •

Feb 6

• 103

Article

Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

and 1 other •

Apr 4

• 14

Article

Introducing the Synthetic Data Generator - Build Datasets with Natural Language

and 5 others •

Dec 16, 2024

• 137

Article

Introducing RWKV — An RNN with the advantages of a transformer

and 3 others •

May 15, 2023

• 23

upvoted a paper 5 months ago

FFN Fusion: Rethinking Sequential Computation in Large Language Models

Paper • 2503.18908 • Published Mar 24 • 20

upvoted 2 articles 6 months ago

Article

Open-Source Handwritten Signature Detection Model

•

Mar 14

• 118

Article

Putting RL back in RLHF

and 1 other •

Jun 12, 2024

• 100

upvoted a paper 7 months ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 203

Sweker

AI & ML interests

Recent Activity

Organizations

Swekerr's activity

SmolLM3: smol, multilingual, long-context reasoner

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models

🐯 Liger GRPO meets TRL

nanoVLM: The simplest repository to train your VLM in pure PyTorch

The Transformers Library: standardizing model definitions

Vision Language Models Explained

Train your first Decision Transformer

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

What is test-time compute and how to scale it?

Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

Introducing the Synthetic Data Generator - Build Datasets with Natural Language

Introducing RWKV — An RNN with the advantages of a transformer

Open-Source Handwritten Signature Detection Model

Putting RL back in RLHF