Liu's picture

1 9

Liu

Shiweiliuiiiiiii

·

https://shiweiliuiiiiiii.github.io/

Shiwei_Liu66

AI & ML interests

LLM, reasoning, ML efficiency

Recent Activity

authored a paper 7 days ago

OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

authored a paper 7 days ago

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

authored a paper 7 days ago

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

View all activity

Organizations

None yet

Shiweiliuiiiiiii's activity

authored 16 papers 7 days ago

OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

Paper • 2405.18380 • Published May 28, 2024 • 1

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

Paper • 2404.03865 • Published Apr 5, 2024

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

Paper • 2403.04797 • Published Mar 5, 2024 • 1

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

Paper • 2202.02643 • Published Feb 5, 2022 • 1

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

Paper • 2106.10404 • Published Jun 19, 2021 • 1

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

Paper • 2306.03805 • Published Jun 6, 2023 • 1

Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

Paper • 2310.08915 • Published Oct 13, 2023

AdaMerging: Adaptive Model Merging for Multi-Task Learning

Paper • 2310.02575 • Published Oct 4, 2023 • 1

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

Paper • 2303.01610 • Published Mar 2, 2023

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Paper • 2407.08296 • Published Jul 11, 2024 • 33

From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

Paper • 2407.11239 • Published Jul 15, 2024 • 8

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

Paper • 2501.12570 • Published Jan 22 • 24

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published 29 days ago • 35

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Paper • 2502.07490 • Published 27 days ago • 9

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Paper • 2502.17055 • Published 14 days ago • 16

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

Paper • 2502.20545 • Published 10 days ago • 20

upvoted a paper 7 days ago

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

Paper • 2502.20545 • Published 10 days ago • 20

upvoted a paper 13 days ago

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Paper • 2502.17055 • Published 14 days ago • 16

upvoted a paper 26 days ago

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Paper • 2502.07490 • Published 27 days ago • 9

commented a paper 26 days ago

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Paper • 2502.07490 • Published 27 days ago • 9 •