Gausson
/

pythia-160m-deduped-n64-SepLLM

sepllm_gpt_neox

Model card Files Files and versions

pythia-160m-deduped-n64-SepLLM / README.md

Gausson's picture

Update README.md

af16c5e verified 5 months ago

|

3.04 kB

	---
	license: mit
	---

	Please refer to the [SepLLM paper - ICML 2025](https://arxiv.org/abs/2412.12094) and our [`GitHub repository`](https://github.com/HKUDS/SepLLM) for using this model.

	To use the checkpoint of this model, you must install the `transformers-4.38.0.post1+sepllm-py3-none-any.whl` released from our [`GitHub repository`](https://github.com/HKUDS/SepLLM). Below are the reference script for testing and a sample of test results. We conducted testing using `lm_eval==0.4.0`.

	```
	CUDA_LAUNCH_BLOCKING=1
	lm_eval --model hf \
	--model_args pretrained=Gausson/pythia-160m-deduped-n64-SepLLM \
	--tasks arc_challenge,arc_easy,lambada_openai,logiqa,piqa,sciq,winogrande,wsc,wikitext \
	--num_fewshot 5 \
	--device cuda:0\
	--batch_size 32
	```

	```
	hf (pretrained=Gausson/pythia-160m-deduped-n64-SepLLM), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 32
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| \| Value \| \|Stderr\|
	\|--------------\|------:\|------\|-----:\|---------------\|---\|------:\|---\|------\|
	\|arc_challenge \| 1\|none \| 5\|acc \|↑ \| 0.1962\|± \|0.0116\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.2406\|± \|0.0125\|
	\|arc_easy \| 1\|none \| 5\|acc \|↑ \| 0.4655\|± \|0.0102\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.4377\|± \|0.0102\|
	\|lambada_openai\| 1\|none \| 5\|acc \|↑ \| 0.2909\|± \|0.0063\|
	\| \| \|none \| 5\|perplexity \|↓ \|40.0674\|± \|1.3492\|
	\|logiqa \| 1\|none \| 5\|acc \|↑ \| 0.2642\|± \|0.0173\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.2750\|± \|0.0175\|
	\|piqa \| 1\|none \| 5\|acc \|↑ \| 0.6360\|± \|0.0112\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.6349\|± \|0.0112\|
	\|sciq \| 1\|none \| 5\|acc \|↑ \| 0.8000\|± \|0.0127\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.7830\|± \|0.0130\|
	\|wikitext \| 2\|none \| 5\|bits_per_byte \|↓ \| 0.9251\|± \| N/A\|
	\| \| \|none \| 5\|byte_perplexity\|↓ \| 1.8988\|± \| N/A\|
	\| \| \|none \| 5\|word_perplexity\|↓ \|30.8396\|± \| N/A\|
	\|winogrande \| 1\|none \| 5\|acc \|↑ \| 0.5178\|± \|0.0140\|
	\|wsc \| 1\|none \| 5\|acc \|↑ \| 0.3846\|± \|0.0479\|
	```

	If you find our work helpful, please consider giving us a star ⭐ @ our [`GitHub repository`](https://github.com/HKUDS/SepLLM) and citing our paper. We greatly appreciate your support 😄
	```
	@inproceedings{chen2025sepllm,
	title={{SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator}},
	author={Chen, Guoxuan and Shi, Han and Li, Jiawei and Gao, Yihang and Ren, Xiaozhe and Chen, Yimeng and Jiang, Xin and Li, Zhenguo and Liu, Weiyang and Huang, Chao},
	booktitle={International Conference on Machine Learning},
	year={2025},
	note={Also available at arXiv:2412.12094}
	}
	```