--- base_model: - meta-llama/Meta-Llama-3-8B library_name: transformers --- # MaskLLM: Learnable Semi-structured Sparsity for Large Language Models

This work introduces [MaskLLM](https://github.com/NVlabs/MaskLLM), a **learnable** pruning method that establishes **Semi-structured (or ``N:M'') Sparsity** in LLMs, aimed at reducing computational overhead during inference. The proposed method is scalable and stands to benefit from larger training datasets. ## Requirements We provide pre-computed masks for Huggingface Models such as Llama-2 7B and Llama-3 8B with the minimum requirements. It will not involve docker, Megatron or data preprocessing. ```bash pip install transformers accelerate datasets SentencePiece ``` ## Pre-computed Masks The following masks were trained and provided by [@VainF](https://github.com/VainF). We use ``huggingface_hub`` to automatically download those masks and apply them to offcical LLMs for evaluation. Those mask files were compressed using [numpy.savez_compressed](tool_compress_mask.py). More results for baselines (SparseGPT, Wanda) can be found in the appendix. | Model | Pattern | Training Data | Training/Eval SeqLen | PPL (Dense) | PPL (Sparse) | Link | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | LLaMA-2 7B | 2:4 | C4 (2B Tokens)| 4096 | 5.12 | 6.78 | [HuggingFace](https://huggingface.co/Vinnnf/LLaMA-2-7B-MaskLLM-C4) | | LLaMA-3 8B | 2:4 | C4 (2B Tokens) | 4096 | 5.75 | 8.49 | [HuggingFace]() | | LLaMA-3.1 8B | 2:4 | C4 (2B Tokens) | 4096 | - | Comming Soon | ## How to use it Please see [NVlabs/MaskLLM](https://github.com/NVlabs/MaskLLM).