Update README.md
Browse files
README.md
CHANGED
@@ -15,6 +15,8 @@ pipeline_tag: image-text-to-text
|
|
15 |
inference: false
|
16 |
license: mit
|
17 |
---
|
|
|
|
|
18 |
# 🔥 InternVL3-8B-FP8-Dynamic: Optimized Vision-Language Model 🔥
|
19 |
This is a **FP8 dynamic quantized** version of [OpenGVLab/InternVL3-8B](https://huggingface.co/OpenGVLab/InternVL3-8B), optimized for high-performance inference with vLLM.
|
20 |
The model utilizes **dynamic FP8 quantization** for optimal ease of use and deployment, achieving significant speedup with minimal accuracy degradation on vision-language tasks.
|
@@ -52,7 +54,7 @@ print(response[0].outputs[0].text)
|
|
52 |
|
53 |
## 🏗️ Technical Specifications
|
54 |
### Hardware Requirements
|
55 |
-
- **Inference**:
|
56 |
- **Supported GPUs**: H100, L40S, A100 (80GB), RTX 4090 (2x for tensor parallelism)
|
57 |
- **GPU Architecture**: Ada Lovelace, Hopper (for optimal FP8 performance)
|
58 |
### Quantization Details
|
|
|
15 |
inference: false
|
16 |
license: mit
|
17 |
---
|
18 |
+
# WIP: This FP8 Quantization does not yet work in vLLM
|
19 |
+
|
20 |
# 🔥 InternVL3-8B-FP8-Dynamic: Optimized Vision-Language Model 🔥
|
21 |
This is a **FP8 dynamic quantized** version of [OpenGVLab/InternVL3-8B](https://huggingface.co/OpenGVLab/InternVL3-8B), optimized for high-performance inference with vLLM.
|
22 |
The model utilizes **dynamic FP8 quantization** for optimal ease of use and deployment, achieving significant speedup with minimal accuracy degradation on vision-language tasks.
|
|
|
54 |
|
55 |
## 🏗️ Technical Specifications
|
56 |
### Hardware Requirements
|
57 |
+
- **Inference**: 7.8GB VRAM (+ Context)
|
58 |
- **Supported GPUs**: H100, L40S, A100 (80GB), RTX 4090 (2x for tensor parallelism)
|
59 |
- **GPU Architecture**: Ada Lovelace, Hopper (for optimal FP8 performance)
|
60 |
### Quantization Details
|