uyzhang commited on
Commit
17774f4
·
1 Parent(s): 7f459c4

update readme about vLLM

Browse files
Files changed (1) hide show
  1. README.md +154 -0
README.md CHANGED
@@ -31,6 +31,7 @@ This dataset enables Bee-8B to achieve exceptional performance, particularly in
31
  - **State-of-the-Art Open Model:** Our model, **Bee-8B**, achieves state-of-the-art performance among fully open MLLMs and is highly competitive with recent semi-open models like InternVL3.5-8B, demonstrating the power of high-quality data.
32
 
33
  ## News
 
34
 
35
  - **[2025.10.13]** 🐝 **Bee-8B is Released\!** Our model is now publicly available. You can download it from [Hugging Face](https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995).
36
 
@@ -101,6 +102,159 @@ output_text = processor.decode(output_ids, skip_special_tokens=True)
101
  print(output_text)
102
  ```
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  ## Experimental Results
105
 
106
  <figure align="center">
 
31
  - **State-of-the-Art Open Model:** Our model, **Bee-8B**, achieves state-of-the-art performance among fully open MLLMs and is highly competitive with recent semi-open models like InternVL3.5-8B, demonstrating the power of high-quality data.
32
 
33
  ## News
34
+ - **[2025.10.20]** 🚀 **vLLM Support is Here!** Bee-8B now supports high-performance inference with [vLLM](https://github.com/vllm-project/vllm), enabling faster and more efficient deployment for production use cases.
35
 
36
  - **[2025.10.13]** 🐝 **Bee-8B is Released\!** Our model is now publicly available. You can download it from [Hugging Face](https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995).
37
 
 
102
  print(output_text)
103
  ```
104
 
105
+ ### Using vLLM for High-Performance Inference
106
+
107
+ #### Install vLLM
108
+
109
+ > [!IMPORTANT]
110
+ > Bee-8B support will be officially available in vLLM **v0.11.1**. Until then, please install vLLM from source:
111
+
112
+ ```bash
113
+ git clone https://github.com/vllm-project/vllm.git
114
+ cd vllm
115
+ VLLM_USE_PRECOMPILED=1 uv pip install --editable .
116
+ ```
117
+
118
+ Once vLLM v0.11.1 is released, you will be able to install it directly via pip:
119
+ ```bash
120
+ pip install vllm>=0.11.1
121
+ ```
122
+
123
+
124
+ #### Offline Inference
125
+ ```python
126
+ from transformers import AutoProcessor
127
+ from vllm import LLM, SamplingParams
128
+ from PIL import Image
129
+ import requests
130
+
131
+
132
+ def main():
133
+
134
+ model_path = "Open-Bee/Bee-8B-RL"
135
+
136
+ llm = LLM(
137
+ model=model_path,
138
+ limit_mm_per_prompt={"image": 5},
139
+ trust_remote_code=True,
140
+ tensor_parallel_size=1,
141
+ gpu_memory_utilization=0.8,
142
+ )
143
+
144
+ sampling_params = SamplingParams(
145
+ temperature=0.6,
146
+ max_tokens=16384,
147
+ )
148
+
149
+ image_url = "https://huggingface.co/Open-Bee/Bee-8B-RL/resolve/main/assets/logo.png"
150
+ image = Image.open(requests.get(image_url, stream=True).raw)
151
+
152
+ messages = [
153
+ {
154
+ "role":
155
+ "user",
156
+ "content": [
157
+ {
158
+ "type": "image",
159
+ "image": image
160
+ },
161
+ {
162
+ "type":
163
+ "text",
164
+ "text":
165
+ "Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)."
166
+ },
167
+ ],
168
+ },
169
+ ]
170
+
171
+ processor = AutoProcessor.from_pretrained(model_path,
172
+ trust_remote_code=True)
173
+ prompt = processor.apply_chat_template(
174
+ messages,
175
+ tokenize=False,
176
+ add_generation_prompt=True,
177
+ enable_thinking=True,
178
+ )
179
+
180
+ mm_data = {"image": image}
181
+ llm_inputs = {
182
+ "prompt": prompt,
183
+ "multi_modal_data": mm_data,
184
+ }
185
+
186
+ outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
187
+ generated_text = outputs[0].outputs[0].text
188
+
189
+ print(generated_text)
190
+
191
+
192
+ if __name__ == '__main__':
193
+ main()
194
+ ```
195
+
196
+ #### Online Serving
197
+ - Start the server
198
+ ```bash
199
+ vllm serve \
200
+ Open-Bee/Bee-8B-RL \
201
+ --served-model-name bee-8b-rl \
202
+ --tensor-parallel-size 8 \
203
+ --gpu-memory-utilization 0.8 \
204
+ --host 0.0.0.0 \
205
+ --port 8000 \
206
+ --trust-remote-code
207
+ ```
208
+
209
+ - Using OpenAI Python Client to Query the server
210
+ ```python
211
+ from openai import OpenAI
212
+
213
+ # Set OpenAI's API key and API base to use vLLM's API server.
214
+ openai_api_key = "EMPTY"
215
+ openai_api_base = "http://localhost:8000/v1"
216
+
217
+ client = OpenAI(
218
+ api_key=openai_api_key,
219
+ base_url=openai_api_base,
220
+ )
221
+
222
+ # image url
223
+ image_messages = [
224
+ {
225
+ "role":
226
+ "user",
227
+ "content": [
228
+ {
229
+ "type": "image_url",
230
+ "image_url": {
231
+ "url":
232
+ "https://huggingface.co/Open-Bee/Bee-8B-RL/resolve/main/assets/logo.png"
233
+ },
234
+ },
235
+ {
236
+ "type":
237
+ "text",
238
+ "text":
239
+ "Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)."
240
+ },
241
+ ],
242
+ },
243
+ ]
244
+
245
+ chat_response = client.chat.completions.create(
246
+ model="bee-8b-rl",
247
+ messages=image_messages,
248
+ max_tokens=16384,
249
+ extra_body={
250
+ "chat_template_kwargs": {
251
+ "enable_thinking": True
252
+ },
253
+ },
254
+ )
255
+ print("Chat response:", chat_response.choices[0].message.content)
256
+ ```
257
+
258
  ## Experimental Results
259
 
260
  <figure align="center">