12 19 7

Chenyang Song

Raincleared

AI & ML interests

None yet

Recent Activity

authored a paper 15 days ago

ConPET: Continual Parameter-Efficient Tuning for Large Language Models

authored a paper 15 days ago

ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs

authored a paper 15 days ago

ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models

View all activity

Organizations

upvoted a paper 16 days ago

BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity

Paper • 2507.08771 • Published 19 days ago • 9

upvoted a collection about 2 months ago

MiniCPM4

Collection

MiniCPM4: Ultra-Efficient LLMs on End Devices • 22 items • Updated Jun 22 • 70

upvoted a paper 3 months ago

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Paper • 2504.17768 • Published Apr 24 • 14

upvoted a paper 5 months ago

SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models

Paper • 2503.07605 • Published Mar 10 • 69

upvoted a paper 8 months ago

Densing Law of LLMs

Paper • 2412.04315 • Published Dec 5, 2024 • 19

upvoted a paper 9 months ago

Sparsing Law: Towards Large Language Models with Greater Activation Sparsity

Paper • 2411.02335 • Published Nov 4, 2024 • 11

upvoted a paper 11 months ago

Configurable Foundation Models: Building LLMs from a Modular Perspective

Paper • 2409.02877 • Published Sep 4, 2024 • 31

upvoted 3 papers about 1 year ago

upvoted a collection about 1 year ago

MiniCPM

Collection

The MiniCPM family of LLMs and VLLMs. • 33 items • Updated 20 days ago • 70

upvoted a collection over 1 year ago

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 820

upvoted 7 papers over 1 year ago

In deep reinforcement learning, a pruned network is a good network

Paper • 2402.12479 • Published Feb 19, 2024 • 19

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 117

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26, 2024 • 75

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 55

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 159

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

Paper • 2401.00448 • Published Dec 31, 2023 • 31

Beyond Surface: Probing LLaMA Across Scales and Layers

Paper • 2312.04333 • Published Dec 7, 2023 • 20

Chenyang Song

AI & ML interests

Recent Activity

Organizations

Raincleared's activity