YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

The following accuracy is using lm-eval and HF:

lm_eval \
  --model hf \
  --model_args pretrained="nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8",dtype=auto,device_map="auto",max_length=100000 \
  --tasks "niah_single_1" \
  --write_out \
  --batch_size 1 \
  --output_path "niah_single_1.json" \
  --show_config

Replace niah_single_1 with niah_single_2,niah_single_2,niah_multikey_1, niah_multikey_2, niah_multikey_3

Accuracy with HF

Category Task meta-llama/Llama-3.1-8B-Instruct nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8 Recovery (%)
NIAH niah_single_1 (100K) OOM 0.0 0
niah_single_2 (100K) OOM 0.0 0
niah_single_3 (100K) OOM 0.0 0
niah_multikey_1 (100K) OOM 0.0 0.
niah_multikey_2 (100K) OOM 0.0 0.0
niah_multikey_3 (100K) OOM 0.0 0.0

The following accuracy is using lm-eval and vLLM:

 lm_eval \
    --model vllm \
    --model_args pretrained="nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8",dtype=auto,add_bos_token=True,max_model_len=131072,tensor_parallel_size=1,gpu_memory_utilization=0.7,enable_chunked_prefill=True,trust_remote_code=True \
    --tasks "niah_single_1" \
    --write_out \
    --batch_size 1 \
    --output_path "niah_single_1.json" \
    --show_config

Replace niah_single_1 with niah_single_2,niah_single_2,niah_multikey_1, niah_multikey_2, niah_multikey_3

Accuracy with vLLM

Category Task meta-llama/Llama-3.1-8B-Instruct nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8 Recovery (%)
LongBench V1 Gov Report 32.16 20.82 64.73
2WikimQA 17.35 18.53 106.8
Qasper 18.22 26.6 145
MultifieldQA 31.11 21.95 70.55
HotpotQA 16.23 4.86 29.9
Musique 9.11 0.94 10.31
NIAH niah_single_1 (4K) 100.0 100.0 100.0
niah_single_2 (4K) 100.0 100.0 100.0
niah_single_3 (4K) 100.0 100.0 100.0
niah_multikey_1 (4K) 100.0 100.0 100.0
niah_multikey_2 (4K) 100.0 100.0 100.0
niah_multikey_3 (4K) 100.0 100.0 100.0
Downloads last month
100
Safetensors
Model size
8B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support