brandonbeiler commited on
Commit
9328e13
·
verified ·
1 Parent(s): e6e579b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -9
README.md CHANGED
@@ -1,20 +1,21 @@
1
  ---
2
- language:
3
- - en
4
- - zh
5
  tags:
6
  - fp8
7
- - quantization
8
- - dynamic
9
- - vision-language
10
- - multimodal
11
  - vllm
12
  - llm-compressor
13
  - internvl3.5
14
-
 
 
15
  pipeline_tag: image-text-to-text
16
  inference: false
17
  license: mit
 
 
 
 
 
18
  ---
19
 
20
  # InternVL3.5 38B FP8
@@ -41,7 +42,22 @@ The quantization process uses a specialized recipe that preserves the model's co
41
  | **Quantization Library** | [LLM Compressor](https://github.com/vllm-project/llm-compressor) v0.7.1 |
42
  | **Quantized By** | [brandonbeiler](https://huggingface.co/brandonbeiler) |
43
 
44
- ## Usage with vLLM
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  The following snippet demonstrates inference using the vLLM library.
47
 
@@ -69,6 +85,8 @@ response = model.generate(prompt, sampling_params)
69
  print(response[0].outputs[0].text)
70
  ```
71
 
 
 
72
  ## Technical Specifications
73
 
74
  ### Hardware Requirements
 
1
  ---
 
 
 
2
  tags:
3
  - fp8
4
+ - fp8-dynamic
 
 
 
5
  - vllm
6
  - llm-compressor
7
  - internvl3.5
8
+ - internvl
9
+ language:
10
+ - multilingual
11
  pipeline_tag: image-text-to-text
12
  inference: false
13
  license: mit
14
+ base_model:
15
+ - OpenGVLab/InternVL3_5-38B
16
+ datasets:
17
+ - OpenGVLab/MMPR-v1.2
18
+ library_name: vllm
19
  ---
20
 
21
  # InternVL3.5 38B FP8
 
42
  | **Quantization Library** | [LLM Compressor](https://github.com/vllm-project/llm-compressor) v0.7.1 |
43
  | **Quantized By** | [brandonbeiler](https://huggingface.co/brandonbeiler) |
44
 
45
+ ## With vLLM OpenAI-Compatible Server
46
+
47
+ You can serve the model using vLLM's OpenAI-compatible API server.
48
+
49
+ ```bash
50
+ python -m vllm.entrypoints.openai.api_server \
51
+ --model brandonbeiler/InternVL3_5-38B-FP8-Dynamic \
52
+ --quantization compressed-tensors \
53
+ --served-model-name internvl3_5-38b \
54
+ --reasoning-parser: qwen3 \
55
+ --trust-remote-code \
56
+ --max-model-len 32768 \
57
+ --tensor-parallel-size 1 # Adjust based on your GPU setup
58
+ ```
59
+
60
+ ## Usage with vLLM in Python
61
 
62
  The following snippet demonstrates inference using the vLLM library.
63
 
 
85
  print(response[0].outputs[0].text)
86
  ```
87
 
88
+
89
+
90
  ## Technical Specifications
91
 
92
  ### Hardware Requirements