Open-Bee
/

Bee-8B-RL

@@ -31,6 +31,7 @@ This dataset enables Bee-8B to achieve exceptional performance, particularly in
   - **State-of-the-Art Open Model:** Our model, **Bee-8B**, achieves state-of-the-art performance among fully open MLLMs and is highly competitive with recent semi-open models like InternVL3.5-8B, demonstrating the power of high-quality data.
 ## News
   - **[2025.10.13]** 🐝 **Bee-8B is Released\!** Our model is now publicly available. You can download it from [Hugging Face](https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995).
@@ -101,6 +102,159 @@ output_text = processor.decode(output_ids, skip_special_tokens=True)
 print(output_text)
 ```
 ## Experimental Results
 <figure align="center">

   - **State-of-the-Art Open Model:** Our model, **Bee-8B**, achieves state-of-the-art performance among fully open MLLMs and is highly competitive with recent semi-open models like InternVL3.5-8B, demonstrating the power of high-quality data.
 ## News
+  - **[2025.10.20]** 🚀 **vLLM Support is Here!** Bee-8B now supports high-performance inference with [vLLM](https://github.com/vllm-project/vllm), enabling faster and more efficient deployment for production use cases.
   - **[2025.10.13]** 🐝 **Bee-8B is Released\!** Our model is now publicly available. You can download it from [Hugging Face](https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995).
 print(output_text)
 ```
+### Using vLLM for High-Performance Inference
+#### Install vLLM
+> [!IMPORTANT]
+> Bee-8B support will be officially available in vLLM **v0.11.1**. Until then, please install vLLM from source:
+```bash
+git clone https://github.com/vllm-project/vllm.git
+cd vllm
+VLLM_USE_PRECOMPILED=1 uv pip install --editable .
+```
+Once vLLM v0.11.1 is released, you will be able to install it directly via pip:
+```bash
+pip install vllm>=0.11.1
+```
+#### Offline Inference
+```python
+from transformers import AutoProcessor
+from vllm import LLM, SamplingParams
+from PIL import Image
+import requests
+def main():
+    model_path = "Open-Bee/Bee-8B-RL"
+    llm = LLM(
+        model=model_path,
+        limit_mm_per_prompt={"image": 5},
+        trust_remote_code=True,
+        tensor_parallel_size=1,
+        gpu_memory_utilization=0.8,
+    )
+    sampling_params = SamplingParams(
+        temperature=0.6,
+        max_tokens=16384,
+    )
+    image_url = "https://huggingface.co/Open-Bee/Bee-8B-RL/resolve/main/assets/logo.png"
+    image = Image.open(requests.get(image_url, stream=True).raw)
+    messages = [
+        {
+            "role":
+            "user",
+            "content": [
+                {
+                    "type": "image",
+                    "image": image
+                },
+                {
+                    "type":
+                    "text",
+                    "text":
+                    "Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)."
+                },
+            ],
+        },
+    ]
+    processor = AutoProcessor.from_pretrained(model_path,
+                                              trust_remote_code=True)
+    prompt = processor.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True,
+        enable_thinking=True,
+    )
+    mm_data = {"image": image}
+    llm_inputs = {
+        "prompt": prompt,
+        "multi_modal_data": mm_data,
+    }
+    outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
+    generated_text = outputs[0].outputs[0].text
+    print(generated_text)
+if __name__ == '__main__':
+    main()
+```
+#### Online Serving
+- Start the server
+```bash
+vllm serve \
+    Open-Bee/Bee-8B-RL \
+    --served-model-name bee-8b-rl \
+    --tensor-parallel-size 8 \
+    --gpu-memory-utilization 0.8 \
+    --host 0.0.0.0 \
+    --port 8000 \
+    --trust-remote-code
+```
+- Using OpenAI Python Client to Query the server
+```python
+from openai import OpenAI
+# Set OpenAI's API key and API base to use vLLM's API server.
+openai_api_key = "EMPTY"
+openai_api_base = "http://localhost:8000/v1"
+client = OpenAI(
+    api_key=openai_api_key,
+    base_url=openai_api_base,
+)
+# image url
+image_messages = [
+    {
+        "role":
+        "user",
+        "content": [
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url":
+                    "https://huggingface.co/Open-Bee/Bee-8B-RL/resolve/main/assets/logo.png"
+                },
+            },
+            {
+                "type":
+                "text",
+                "text":
+                "Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)."
+            },
+        ],
+    },
+]
+chat_response = client.chat.completions.create(
+    model="bee-8b-rl",
+    messages=image_messages,
+    max_tokens=16384,
+    extra_body={
+        "chat_template_kwargs": {
+            "enable_thinking": True
+        },
+    },
+)
+print("Chat response:", chat_response.choices[0].message.content)
+```
 ## Experimental Results
 <figure align="center">