QuantFactory/st-vicuna-v1.3-5.5b-ppl-GGUF
This is quantized version of nota-ai/st-vicuna-v1.3-5.5b-ppl created using llama.cpp
Model Description
Shortened LLaMA Model Card
Shortened LLaMA is a depth-pruned version of LLaMA models & variants for efficient text generation.
- Developed by: Nota AI
- License: Non-commercial license
- Repository: https://github.com/Nota-NetsPresso/shortened-llm
- Paper: https://arxiv.org/abs/2402.02834
Compression Method
After identifying unimportant Transformer blocks, we perform one-shot pruning and light LoRA-based retraining.
Click to see a method figure.
Model Links
Source Model |
Pruning Ratio |
Pruning Criterion |
HF Models Link |
---|---|---|---|
LLaMA-1-7B | 20% | PPL | nota-ai/st-llama-1-5.5b-ppl |
LLaMA-1-7B | 20% | Taylor+ | nota-ai/st-llama-1-5.5b-taylor |
Vicuna-v1.3-7B | 20% | PPL | nota-ai/st-vicuna-v1.3-5.5b-ppl |
Vicuna-v1.3-7B | 20% | Taylor+ | nota-ai/st-vicuna-v1.3-5.5b-taylor |
Vicuna-v1.3-13B | 21% | PPL | nota-ai/st-vicuna-v1.3-10.5b-ppl |
Vicuna-v1.3-13B | 21% | Taylor+ | nota-ai/st-vicuna-v1.3-10.5b-taylor |
Zero-shot Performance & Efficiency Results
- EleutherAI/lm-evaluation-harness version 3326c54
License
- All rights related to this repository and the compressed models are reserved by Nota Inc.
- The intended use is strictly limited to research and non-commercial projects.
Model Acknowledgments
- LLM-Pruner, which utilizes LM Evaluation Harness, PEFT, and Alpaca-LoRA. Thanks for the pioneering work on structured pruning of LLMs!
- Meta AI's LLaMA and LMSYS Org's Vicuna. Thanks for the open-source LLMs!
Original Model Citation
@article{kim2024shortened,
title={Shortened LLaMA: A Simple Depth Pruning for Large Language Models},
author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
journal={arXiv preprint arXiv:2402.02834},
year={2024},
url={https://arxiv.org/abs/2402.02834}
}
@article{kim2024mefomo,
title={Shortened LLaMA: A Simple Depth Pruning for Large Language Models},
author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
journal={ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)},
year={2024},
url={https://openreview.net/forum?id=18VGxuOdpu}
}
- Downloads last month
- 145
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for QuantFactory/st-vicuna-v1.3-5.5b-ppl-GGUF
Base model
nota-ai/st-vicuna-v1.3-5.5b-ppl