YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
The following accuracy is using lm-eval and HF:
lm_eval \
--model hf \
--model_args pretrained="nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8",dtype=auto,device_map="auto",max_length=100000 \
--tasks "niah_single_1" \
--write_out \
--batch_size 1 \
--output_path "niah_single_1.json" \
--show_config
Replace niah_single_1 with niah_single_2,niah_single_2,niah_multikey_1, niah_multikey_2, niah_multikey_3
Accuracy with HF
| Category | Task | meta-llama/Llama-3.1-8B-Instruct | nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8 | Recovery (%) |
|---|---|---|---|---|
| NIAH | niah_single_1 (100K) | OOM | 0.0 | 0 |
| niah_single_2 (100K) | OOM | 0.0 | 0 | |
| niah_single_3 (100K) | OOM | 0.0 | 0 | |
| niah_multikey_1 (100K) | OOM | 0.0 | 0. | |
| niah_multikey_2 (100K) | OOM | 0.0 | 0.0 | |
| niah_multikey_3 (100K) | OOM | 0.0 | 0.0 |
The following accuracy is using lm-eval and vLLM:
lm_eval \
--model vllm \
--model_args pretrained="nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8",dtype=auto,add_bos_token=True,max_model_len=131072,tensor_parallel_size=1,gpu_memory_utilization=0.7,enable_chunked_prefill=True,trust_remote_code=True \
--tasks "niah_single_1" \
--write_out \
--batch_size 1 \
--output_path "niah_single_1.json" \
--show_config
Replace niah_single_1 with niah_single_2,niah_single_2,niah_multikey_1, niah_multikey_2, niah_multikey_3
Accuracy with vLLM
| Category | Task | meta-llama/Llama-3.1-8B-Instruct | nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8 | Recovery (%) |
|---|---|---|---|---|
| LongBench V1 | Gov Report | 32.16 | 20.82 | 64.73 |
| 2WikimQA | 17.35 | 18.53 | 106.8 | |
| Qasper | 18.22 | 26.6 | 145 | |
| MultifieldQA | 31.11 | 21.95 | 70.55 | |
| HotpotQA | 16.23 | 4.86 | 29.9 | |
| Musique | 9.11 | 0.94 | 10.31 | |
| NIAH | niah_single_1 (4K) | 100.0 | 100.0 | 100.0 |
| niah_single_2 (4K) | 100.0 | 100.0 | 100.0 | |
| niah_single_3 (4K) | 100.0 | 100.0 | 100.0 | |
| niah_multikey_1 (4K) | 100.0 | 100.0 | 100.0 | |
| niah_multikey_2 (4K) | 100.0 | 100.0 | 100.0 | |
| niah_multikey_3 (4K) | 100.0 | 100.0 | 100.0 |
- Downloads last month
- 100
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support