Yuchen Cheng

rudeigerc

https://rudeigerc.dev

rudeigerc

AI & ML interests

MLSys

Recent Activity

upvoted a paper 21 days ago

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

liked a model 23 days ago

MiniMaxAI/MiniMax-M1-80k

liked a model 26 days ago

mistralai/Magistral-Small-2506

View all activity

Organizations

None yet

upvoted a paper 21 days ago

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published 23 days ago • 252

liked a model 23 days ago

MiniMaxAI/MiniMax-M1-80k

Text Generation • 456B • Updated 3 days ago • 25.2k • • 640

liked a model 26 days ago

mistralai/Magistral-Small-2506

Text Generation • 24B • Updated 24 days ago • 80k • • 558

upvoted 3 papers about 1 month ago

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Paper • 2505.22618 • Published May 28 • 42

Inference-Time Hyper-Scaling with KV Cache Compression

Paper • 2506.05345 • Published Jun 5 • 27

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 65

liked a model about 1 month ago

deepseek-ai/DeepSeek-R1-0528

Text Generation • 685B • Updated May 29 • 261k • • 2.22k

liked a model 3 months ago

meta-llama/Llama-4-Scout-17B-16E-Instruct

Image-Text-to-Text • 109B • Updated May 22 • 675k • • 987

liked 4 models 4 months ago

upvoted a paper 5 months ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 160

liked a Space 5 months ago

2.78k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

liked a model 10 months ago

Qwen/Qwen2.5-72B-Instruct

Text Generation • 73B • Updated Jan 12 • 159k • • 840

liked a dataset 10 months ago

openai/MMMLU

Viewer • Updated Oct 16, 2024 • 393k • 5.11k • 489

liked a model 10 months ago

jinaai/reader-lm-1.5b

Text Generation • 2B • Updated Jan 17 • 2.13k • • 600

upvoted a paper 10 months ago

NanoFlow: Towards Optimal Large Language Model Serving Throughput

Paper • 2408.12757 • Published Aug 22, 2024 • 18

liked a model 11 months ago

microsoft/Phi-3.5-MoE-instruct

Text Generation • 42B • Updated Mar 7 • 36.5k • 558

upvoted a paper 11 months ago

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8, 2024 • 161

Yuchen Cheng

AI & ML interests

Recent Activity

Organizations

rudeigerc's activity

The Ultra-Scale Playbook