kaitchup
/

Llama-3.1-8B-Instruct-NVFP4

8-bit precision

compressed-tensors

Model card Files Files and versions

bnjmnmarie commited on Sep 24

Commit

710ac3c

·

verified ·

1 Parent(s): 5e56b2b

Update README.md

Files changed (1) hide show

README.md +29 -3

README.md CHANGED Viewed

@@ -1,3 +1,29 @@
----
-license: llama3.1
----

+---
+license: apache-2.0
+base_model:
+- meta-llama/Llama-3.1-8B-Instruct
+tags:
+- llm-compressor
+datasets:
+- HuggingFaceH4/ultrachat_200k
+---
+This is [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) quantized with [LLM Compressor](https://github.com/vllm-project/llm-compressor) in 4-bit (NVFP4), weights and activations.
+The calibration step used 512 samples of up to 2048 tokens, chat template applied, from [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k).
+The quantization has been done, tested, and evaluated by The Kaitchup.
+The model is compatible with vLLM. Use a Blackwell GPU to get >2x throughput.
+More details in this article:
+[NVFP4: Same Accuracy with 2.3x Higher Throughput for 4-Bit LLMs](https://kaitchup.substack.com/p/nvfp4-same-accuracy-with-23-higher)
+- **Developed by:** [The Kaitchup](https://kaitchup.substack.com/)
+- **License:** Apache 2.0 license
+## How to Support My Work
+Subscribe to [The Kaitchup](https://kaitchup.substack.com/subscribe).
+Or, for a one-time contribution, here is my ko-fi link: [https://ko-fi.com/bnjmn_marie](https://ko-fi.com/bnjmn_marie)
+This helps me a lot to continue quantizing and evaluating models for free.