bfuzzy1 (Robin Williams)

upvoted a paper 4 months ago

Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

Paper • 2509.01363 • Published Sep 1, 2025 • 58

upvoted an article 5 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

+21

Jul 8, 2025

•

743

upvoted a collection 5 months ago

Encoders vs Decoders: the Ettin Suite

Collection

A collection of SOTA, open-data, paired encoder-only and decoder only models ranging from 17M params to 1B. See the paper at https://arxiv.org/abs/250 • 32 items • Updated Jul 16, 2025 • 25

upvoted 3 papers 6 months ago

upvoted an article 6 months ago

Article

Transformers Are Getting Old: Variants and Alternatives Exist!

Jul 5, 2025

•

44

upvoted 5 papers 6 months ago

PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models

Paper • 2506.16054 • Published Jun 19, 2025 • 60

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19, 2025 • 130

FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies

Paper • 2506.17673 • Published Jun 21, 2025 • 7

Steering Conceptual Bias via Transformer Latent-Subspace Activation

Paper • 2506.18887 • Published Jun 23, 2025 • 6

Orthogonal Finetuning Made Scalable

Paper • 2506.19847 • Published Jun 24, 2025 • 11

upvoted 7 papers 7 months ago

RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

Paper • 2506.08672 • Published Jun 10, 2025 • 30

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2, 2025 • 187

Shifting AI Efficiency From Model-Centric to Data-Centric Compression

Paper • 2505.19147 • Published May 25, 2025 • 144

Truth Neurons

Paper • 2505.12182 • Published May 18, 2025 • 8

dKV-Cache: The Cache for Diffusion Language Models

Paper • 2505.15781 • Published May 21, 2025 • 16

Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Paper • 2505.15045 • Published May 21, 2025 • 54

Position of Uncertainty: A Cross-Linguistic Study of Positional Bias in Large Language Models

Paper • 2505.16134 • Published May 22, 2025 • 18

upvoted a paper 8 months ago

TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published Feb 11, 2025 • 57

Robin Williams

AI & ML interests

Organizations

Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

SmolLM3: smol, multilingual, long-context reasoner

Encoders vs Decoders: the Ettin Suite

FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Teach Old SAEs New Domain Tricks with Boosting

Scaling Laws for Optimal Data Mixtures

Transformers Are Getting Old: Variants and Alternatives Exist!

PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies

Steering Conceptual Bias via Transformer Latent-Subspace Activation

Orthogonal Finetuning Made Scalable

RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Shifting AI Efficiency From Model-Centric to Data-Centric Compression

Truth Neurons

dKV-Cache: The Cache for Diffusion Language Models

Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Position of Uncertainty: A Cross-Linguistic Study of Positional Bias in Large Language Models

TransMLA: Multi-head Latent Attention Is All You Need

Robin Williams

AI & ML interests

Organizations

bfuzzy1's activity

SmolLM3: smol, multilingual, long-context reasoner

Transformers Are Getting Old: Variants and Alternatives Exist!