Update README.md
Browse files
README.md
CHANGED
@@ -4,3 +4,32 @@ base_model:
|
|
4 |
library_name: transformers
|
5 |
---
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
library_name: transformers
|
5 |
---
|
6 |
|
7 |
+
# MaskLLM: Learnable Semi-structured Sparsity for Large Language Models
|
8 |
+
|
9 |
+
<div align="center">
|
10 |
+
<figure>
|
11 |
+
<img src="https://github.com/NVlabs/MaskLLM/blob/main/assets/teaser.png?raw=true" style="width:70%; display:block; margin-left:auto; margin-right:auto;"
|
12 |
+
</figure>
|
13 |
+
</div>
|
14 |
+
|
15 |
+
This work introduces [MaskLLM](https://github.com/NVlabs/MaskLLM), a **learnable** pruning method that establishes **Semi-structured (or ``N:M'') Sparsity** in LLMs, aimed at reducing computational overhead during inference. The proposed method is scalable and stands to benefit from larger training datasets.
|
16 |
+
|
17 |
+
## Requirements
|
18 |
+
We provide pre-computed masks for Huggingface Models such as Llama-2 7B and Llama-3 8B with the minimum requirements. It will not involve docker, Megatron or data preprocessing.
|
19 |
+
```bash
|
20 |
+
pip install transformers accelerate datasets SentencePiece
|
21 |
+
```
|
22 |
+
|
23 |
+
## Pre-computed Masks
|
24 |
+
|
25 |
+
The following masks were trained and provided by [@VainF](https://github.com/VainF). We use ``huggingface_hub`` to automatically download those masks and apply them to offcical LLMs for evaluation. Those mask files were compressed using [numpy.savez_compressed](tool_compress_mask.py). More results for baselines (SparseGPT, Wanda) can be found in the appendix.
|
26 |
+
| Model | Pattern | Training Data | Training/Eval SeqLen | PPL (Dense) | PPL (Sparse) | Link |
|
27 |
+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
28 |
+
| LLaMA-2 7B | 2:4 | C4 (2B Tokens)| 4096 | 5.12 | 6.78 | [HuggingFace](https://huggingface.co/Vinnnf/LLaMA-2-7B-MaskLLM-C4) |
|
29 |
+
| LLaMA-3 8B | 2:4 | C4 (2B Tokens) | 4096 | 5.75 | 8.49 | [HuggingFace]() |
|
30 |
+
| LLaMA-3.1 8B | 2:4 | C4 (2B Tokens) | 4096 | - | Comming Soon |
|
31 |
+
|
32 |
+
|
33 |
+
## How to use it
|
34 |
+
|
35 |
+
Please see [NVlabs/MaskLLM](https://github.com/NVlabs/MaskLLM).
|