2 18 16

Shwai He

Shwai

https://shwai-he.github.io/

Shwai-He

AI & ML interests

Deep Learning, Mechine Learning, Natural Language Processing.

Recent Activity

upvoted a collection 10 days ago

Qwen2.5

upvoted a paper about 1 month ago

Making Large Language Models Efficient Dense Retrievers

upvoted a paper 2 months ago

Understanding and Harnessing Sparsity in Unified Multimodal Models

View all activity

Organizations

upvoted a collection 10 days ago

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Dec 31, 2025 • 685

upvoted a paper about 1 month ago

Making Large Language Models Efficient Dense Retrievers

Paper • 2512.20612 • Published Dec 23, 2025 • 2

upvoted a paper 2 months ago

Understanding and Harnessing Sparsity in Unified Multimodal Models

Paper • 2512.02351 • Published Dec 2, 2025 • 2

commented a paper 2 months ago

Understanding and Harnessing Sparsity in Unified Multimodal Models

Paper • 2512.02351 • Published Dec 2, 2025 • 2 •

upvoted a paper 2 months ago

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Paper • 2511.09611 • Published Nov 12, 2025 • 70

upvoted a paper 3 months ago

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Paper • 2511.02779 • Published Nov 4, 2025 • 59

upvoted a collection 4 months ago

Qwen3-VL

Collection

37 items • Updated Dec 31, 2025 • 616

upvoted a paper 5 months ago

Dense Video Understanding with Gated Residual Tokenization

Paper • 2509.14199 • Published Sep 17, 2025 • 2

liked a dataset 5 months ago

haichaozhang/DenseVideoEvaluation

Preview • Updated Sep 18, 2025 • 15 • 2

upvoted a paper 8 months ago

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Paper • 2506.01713 • Published Jun 2, 2025 • 48

liked a model 10 months ago

Qwen/QwQ-32B

Text Generation • 33B • Updated Mar 11, 2025 • 44.7k • • 2.89k

upvoted a collection 11 months ago

computation

Collection

this is for Mixture of XXX • 1 item • Updated Oct 23, 2024 • 2

upvoted 2 papers 11 months ago

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

Paper • 2410.13184 • Published Oct 17, 2024 • 3

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

Paper • 2503.05066 • Published Mar 7, 2025 • 4

commented a paper 11 months ago

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

Paper • 2503.05066 • Published Mar 7, 2025 • 4 •

upvoted 2 papers about 1 year ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 437

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 25

liked 3 models over 1 year ago

Shwai He

AI & ML interests

Recent Activity

Organizations

Shwai's activity