Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
[π Homepage] [π Arxiv Paper] [π€ Models] [π€ Datasets(coming soon)] [π» Code(coming soon)]
Introduction
We introduce Bee-8B, a new state-of-the-art, fully open 8B Multimodal Large Language Model (MLLM) designed to close the performance gap with proprietary models by focusing on data quality.
Bee-8B is trained on our new Honey-Data-15M corpus, a high-quality supervised fine-tuning (SFT) dataset of approximately 15 million samples. This dataset was meticulously created with our transparent, adaptable, and open-source data curation pipeline, HoneyPipe, which systematically cleans noisy data and enriches it with a novel dual-level (short and long) Chain-of-Thought (CoT) strategy.
This dataset enables Bee-8B to achieve exceptional performance, particularly in complex reasoning, establishing a new standard for fully open MLLMs.
Key Features
- High-Quality, Large-Scale Dataset: We release Honey-Data-15M, a new 15M-sample SFT corpus. It has undergone extensive cleaning to remove widespread noise and has been enriched with dual-level CoT reasoning to enhance advanced problem-solving capabilities.
- Fully Open-Source Data Curation Suite: We provide not just the data, but the entire methodology. HoneyPipe and its underlying framework DataStudio offer the community a transparent and reproducible pipeline, moving beyond static dataset releases.
- State-of-the-Art Open Model: Our model, Bee-8B, achieves state-of-the-art performance among fully open MLLMs and is highly competitive with recent semi-open models like InternVL3.5-8B, demonstrating the power of high-quality data.
News
[2025.10.20] π vLLM Support is Here! Bee-8B now supports high-performance inference with vLLM, enabling faster and more efficient deployment for production use cases.
[2025.10.13] π Bee-8B is Released! Our model is now publicly available. You can download it from Hugging Face.
Quickstart
Below, we provide simple examples to show how to use Bee-8B with π€ Transformers. You can dynamically control the model's response by selecting one of two modes: set
enable_thinking=True
forthinking
mode, orenable_thinking=False
fornon-thinking
mode. The default isthinking
mode.
Using π€ Transformers to Chat
import requests
import torch
from PIL import Image
from transformers import AutoModel, AutoProcessor
model_path = "Open-Bee/Bee-8B-RL"
# Load model
model = AutoModel.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).to("cuda")
# Load processor
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
# Define conversation messages
messages = [{
"role":
"user",
"content": [
{
"type": "image",
"image": "https://huggingface.co/Open-Bee/Bee-8B-RL/resolve/main/assets/logo.png",
},
{
"type": "text",
"text": "Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)."
},
],
}]
# Apply chat template
text = processor.apply_chat_template(messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True)
# Load image
image_url = "https://huggingface.co/Open-Bee/Bee-8B-RL/resolve/main/assets/logo.png"
image = Image.open(requests.get(image_url, stream=True).raw)
# Process inputs
inputs = processor(images=image, text=text, return_tensors="pt").to("cuda")
# Generate output
generated_ids = model.generate(**inputs, max_new_tokens=16384, temperature=0.6)
output_ids = generated_ids[0][len(inputs.input_ids[0]):]
# Decode output
output_text = processor.decode(output_ids, skip_special_tokens=True)
# Print result
print(output_text)
Using vLLM for High-Performance Inference
Install vLLM
Bee-8B support will be officially available in vLLM v0.11.1. Until then, please install vLLM from source:
git clone https://github.com/vllm-project/vllm.git
cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install --editable .
Once vLLM v0.11.1 is released, you will be able to install it directly via pip:
pip install vllm>=0.11.1
Offline Inference
from transformers import AutoProcessor
from vllm import LLM, SamplingParams
from PIL import Image
import requests
def main():
model_path = "Open-Bee/Bee-8B-RL"
llm = LLM(
model=model_path,
limit_mm_per_prompt={"image": 5},
trust_remote_code=True,
tensor_parallel_size=1,
gpu_memory_utilization=0.8,
)
sampling_params = SamplingParams(
temperature=0.6,
max_tokens=16384,
)
image_url = "https://huggingface.co/Open-Bee/Bee-8B-RL/resolve/main/assets/logo.png"
image = Image.open(requests.get(image_url, stream=True).raw)
messages = [
{
"role":
"user",
"content": [
{
"type": "image",
"image": image
},
{
"type":
"text",
"text":
"Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)."
},
],
},
]
processor = AutoProcessor.from_pretrained(model_path,
trust_remote_code=True)
prompt = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True,
)
mm_data = {"image": image}
llm_inputs = {
"prompt": prompt,
"multi_modal_data": mm_data,
}
outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
generated_text = outputs[0].outputs[0].text
print(generated_text)
if __name__ == '__main__':
main()
Online Serving
- Start the server
vllm serve \
Open-Bee/Bee-8B-RL \
--served-model-name bee-8b-rl \
--tensor-parallel-size 8 \
--gpu-memory-utilization 0.8 \
--host 0.0.0.0 \
--port 8000 \
--trust-remote-code
- Using OpenAI Python Client to Query the server
from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
# image url
image_messages = [
{
"role":
"user",
"content": [
{
"type": "image_url",
"image_url": {
"url":
"https://huggingface.co/Open-Bee/Bee-8B-RL/resolve/main/assets/logo.png"
},
},
{
"type":
"text",
"text":
"Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)."
},
],
},
]
chat_response = client.chat.completions.create(
model="bee-8b-rl",
messages=image_messages,
max_tokens=16384,
extra_body={
"chat_template_kwargs": {
"enable_thinking": True
},
},
)
print("Chat response:", chat_response.choices[0].message.content)
Experimental Results

- New State-of-the-Art: Bee-8B establishes a new performance standard for fully open MLLMs, proving highly competitive with recent semi-open models across a wide array of benchmarks.
- Excellence in Complex Reasoning: Thanks to the CoT-enriched Honey-Data-15M, Bee-8B shows its most significant advancements in complex math and reasoning. It achieves top scores on challenging benchmarks like MathVerse, LogicVista, and DynaMath.
- Superior Document and Chart Understanding: The model demonstrates powerful capabilities in analyzing structured visual data, securing the top rank on the CharXiv benchmark for both descriptive and reasoning questions.
Acknowledgements
Bee-8B is developed based on the architectures and codebases of the following projects: R-4B, LLaVA-OneVision, SigLIP2, Qwen3, and evaluated using VLMEvalKit. We sincerely thank these projects for their outstanding contributions to the open-source community.
- Downloads last month
- 1,289