YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

The following accuracy is using lm-eval and HF:

lm_eval \
  --model hf \
  --model_args pretrained="nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8",dtype=auto,device_map="auto",max_length=100000 \
  --tasks "niah_single_1" \
  --write_out \
  --batch_size 1 \
  --output_path "niah_single_1.json" \
  --show_config

Replace niah_single_1 with niah_single_2,niah_single_2,niah_multikey_1, niah_multikey_2, niah_multikey_3

Accuracy with HF

Category	Task	meta-llama/Llama-3.1-8B-Instruct	Recovery (%)
NIAH	niah_single_1 (100K)	OOM	0
	niah_single_2 (100K)	OOM	0
	niah_single_3 (100K)	OOM	0
	niah_multikey_1 (100K)	OOM	0.
	niah_multikey_2 (100K)	OOM	0.0
	niah_multikey_3 (100K)	OOM	0.0

The following accuracy is using lm-eval and vLLM:

 lm_eval \
    --model vllm \
    --model_args pretrained="nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8",dtype=auto,add_bos_token=True,max_model_len=131072,tensor_parallel_size=1,gpu_memory_utilization=0.7,enable_chunked_prefill=True,trust_remote_code=True \
    --tasks "niah_single_1" \
    --write_out \
    --batch_size 1 \
    --output_path "niah_single_1.json" \
    --show_config

Replace niah_single_1 with niah_single_2,niah_single_2,niah_multikey_1, niah_multikey_2, niah_multikey_3

Accuracy with vLLM

Category	Task	meta-llama/Llama-3.1-8B-Instruct	nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8	Recovery (%)
LongBench V1	Gov Report	32.16	20.82	64.73
	2WikimQA	17.35	18.53	106.8
	Qasper	18.22	26.6	145
	MultifieldQA	31.11	21.95	70.55
	HotpotQA	16.23	4.86	29.9
	Musique	9.11	0.94	10.31
NIAH	niah_single_1 (4K)	100.0	100.0	100.0
	niah_single_2 (4K)	100.0	100.0	100.0
	niah_single_3 (4K)	100.0	100.0	100.0
	niah_multikey_1 (4K)	100.0	100.0	100.0
	niah_multikey_2 (4K)	100.0	100.0	100.0
	niah_multikey_3 (4K)	100.0	100.0	100.0

Downloads last month: 100

Safetensors

Model size

8B params

Tensor type

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support