brandonbeiler commited on
Commit
50bc29c
·
verified ·
1 Parent(s): b5c3e98

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -15,6 +15,8 @@ pipeline_tag: image-text-to-text
15
  inference: false
16
  license: mit
17
  ---
 
 
18
  # 🔥 InternVL3-8B-FP8-Dynamic: Optimized Vision-Language Model 🔥
19
  This is a **FP8 dynamic quantized** version of [OpenGVLab/InternVL3-8B](https://huggingface.co/OpenGVLab/InternVL3-8B), optimized for high-performance inference with vLLM.
20
  The model utilizes **dynamic FP8 quantization** for optimal ease of use and deployment, achieving significant speedup with minimal accuracy degradation on vision-language tasks.
@@ -52,7 +54,7 @@ print(response[0].outputs[0].text)
52
 
53
  ## 🏗️ Technical Specifications
54
  ### Hardware Requirements
55
- - **Inference**: ? VRAM
56
  - **Supported GPUs**: H100, L40S, A100 (80GB), RTX 4090 (2x for tensor parallelism)
57
  - **GPU Architecture**: Ada Lovelace, Hopper (for optimal FP8 performance)
58
  ### Quantization Details
 
15
  inference: false
16
  license: mit
17
  ---
18
+ # WIP: This FP8 Quantization does not yet work in vLLM
19
+
20
  # 🔥 InternVL3-8B-FP8-Dynamic: Optimized Vision-Language Model 🔥
21
  This is a **FP8 dynamic quantized** version of [OpenGVLab/InternVL3-8B](https://huggingface.co/OpenGVLab/InternVL3-8B), optimized for high-performance inference with vLLM.
22
  The model utilizes **dynamic FP8 quantization** for optimal ease of use and deployment, achieving significant speedup with minimal accuracy degradation on vision-language tasks.
 
54
 
55
  ## 🏗️ Technical Specifications
56
  ### Hardware Requirements
57
+ - **Inference**: 7.8GB VRAM (+ Context)
58
  - **Supported GPUs**: H100, L40S, A100 (80GB), RTX 4090 (2x for tensor parallelism)
59
  - **GPU Architecture**: Ada Lovelace, Hopper (for optimal FP8 performance)
60
  ### Quantization Details