Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -15,6 +15,8 @@ pipeline_tag: image-text-to-text
 inference: false
 license: mit
 ---
 # 🔥 InternVL3-8B-FP8-Dynamic: Optimized Vision-Language Model 🔥
 This is a **FP8 dynamic quantized** version of [OpenGVLab/InternVL3-8B](https://huggingface.co/OpenGVLab/InternVL3-8B), optimized for high-performance inference with vLLM.
 The model utilizes **dynamic FP8 quantization** for optimal ease of use and deployment, achieving significant speedup with minimal accuracy degradation on vision-language tasks.
@@ -52,7 +54,7 @@ print(response[0].outputs[0].text)
 ## 🏗️ Technical Specifications
 ### Hardware Requirements
-- **Inference**: ? VRAM
 - **Supported GPUs**: H100, L40S, A100 (80GB), RTX 4090 (2x for tensor parallelism)
 - **GPU Architecture**: Ada Lovelace, Hopper (for optimal FP8 performance)
 ### Quantization Details

 inference: false
 license: mit
 ---
+# WIP: This FP8 Quantization does not yet work in vLLM
 # 🔥 InternVL3-8B-FP8-Dynamic: Optimized Vision-Language Model 🔥
 This is a **FP8 dynamic quantized** version of [OpenGVLab/InternVL3-8B](https://huggingface.co/OpenGVLab/InternVL3-8B), optimized for high-performance inference with vLLM.
 The model utilizes **dynamic FP8 quantization** for optimal ease of use and deployment, achieving significant speedup with minimal accuracy degradation on vision-language tasks.
 ## 🏗️ Technical Specifications
 ### Hardware Requirements
+- **Inference**: 7.8GB VRAM (+ Context)
 - **Supported GPUs**: H100, L40S, A100 (80GB), RTX 4090 (2x for tensor parallelism)
 - **GPU Architecture**: Ada Lovelace, Hopper (for optimal FP8 performance)
 ### Quantization Details