momo's picture

momo

wzc991222

·

AI & ML interests

None yet

Recent Activity

liked a model 1 day ago

google/gemma-3n-E4B

upvoted a paper 4 days ago

Towards AI Search Paradigm

upvoted a paper 9 days ago

Block Transformer: Global-to-Local Language Modeling for Fast Inference

View all activity

Organizations

upvoted a paper 4 days ago

Towards AI Search Paradigm

Paper • 2506.17188 • Published 7 days ago • 5

upvoted a paper 9 days ago

Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 42

upvoted 3 papers 11 days ago

Serving Large Language Models on Huawei CloudMatrix384

Paper • 2506.12708 • Published 13 days ago • 1

Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published 11 days ago • 41

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published 12 days ago • 240

upvoted 2 papers 18 days ago

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Paper • 2506.07491 • Published 19 days ago • 38

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published 18 days ago • 81

upvoted 4 papers about 1 month ago

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 204

Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 81

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 69

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 65

upvoted 3 papers about 2 months ago

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Paper • 2505.02567 • Published May 5 • 76

A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency

Paper • 2505.01658 • Published May 3 • 36

Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities

Paper • 2505.01043 • Published May 2 • 10

upvoted 6 papers 2 months ago

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 71

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 295

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 119

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 404

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 235

Sleep-time Compute: Beyond Inference Scaling at Test-time

Paper • 2504.13171 • Published Apr 17 • 15