huawei-noah
/

MOASpec-Llama-3-8B-Instruct

Model card Files Files and versions Community

MatthieuZ commited on Jan 7

Commit

50ec842

·

verified ·

1 Parent(s): 8a8e888

Update README.md

Files changed (1) hide show

README.md +37 -3

README.md CHANGED Viewed

@@ -1,3 +1,37 @@
----
-license: mit
----

+# Mixture of Attentions for Speculative Decoding
+This is checkpoints obtained from "[Mixture of Attentions For Speculative Decoding](https://arxiv.org/abs/2410.03804)" by Matthieu Zimmer*, Milan Gritta*, Gerasimos Lampouras, Haitham Bou Ammar, and Jun Wang.
+The paper introduces a novel architecture for speculative decoding that enhances the speed of large language model (LLM) inference.
+It is supported in vLLM see our [Github repository](https://github.com/huawei-noah/HEBO/tree/mixture-of-attentions/).
+### Checkpoints
+| Base Model  | MOA Spec on Hugging Face  | Base Model Parameters  | MOA Spec Parameters |
+|------|------|------|------|
+| meta-llama/Meta-Llama-3-8B-Instruct | [huawei-noah/MOASpec-Llama-3-8B-Instruct](https://huggingface.co/huawei-noah/MOASpec-Llama-3-8B-Instruct) | 8B | 0.25B |
+## Citation
+If you use this code or this checkpoint in your research, please cite our paper:
+```bibtex
+@misc{zimmer2024mixtureattentionsspeculativedecoding,
+      title={Mixture of Attentions For Speculative Decoding},
+      author={Matthieu Zimmer and Milan Gritta and Gerasimos Lampouras and Haitham Bou Ammar and Jun Wang},
+      year={2024},
+      eprint={2410.03804},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2410.03804},
+}
+```
+## License
+This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
+Disclaimer: This open source project is not an official Huawei product, Huawei is not expected to provide support for this project.