R-4B: Incentivizing General-Purpose Auto-Thinking Capabilities in MLLMs via Bi-Mode Integration

[πŸ“š Arxiv Paper (Coming soon)] [πŸ€— Hugging Face] [πŸ€–οΈ ModelScope] [πŸ’» Code]

logo
R-4B Performance

⭐️ Introduction

In this repo, we present R-4B, a multimodal large language model designed for general-purpose auto-thinking, autonomously switching between step-by-step thinking and direct response generation based on task complexity. This capability enables R-4B to deliver high-quality responses while significantly improving inference efficiency and reducing computational costs.

The development of R-4B follows a two-stage training paradigm: (1) Bi-mode Annealing, which establishes both thinking and non-thinking capabilities for VQA; and (2) Bi-mode Policy Optimization (BPO), which enables the model to adaptively switch between thinking and non-thinking modes based on input demands.

πŸš€ Key Features

  • 🧠 Think Smart, Act Fast: Adaptive & Controllable Thinking! Our model provides three-mode control over the response process.

    • Auto-thinking Mode: Unleash auto-thinking that works across general topics, from simple Q&A to complex scientific analysis. It saves time and computation by thinking only when it matters.
    • Support Manual Control: Explicitly command the model to use its thinking or non-thinking capabilities, enabling you to make your choices for every job.
  • πŸ† ** Strong Performance, Open for Everyone!** Our model is now fully open-source. It achieves state-of-the-art performance among models of comparable size.

πŸ“’ News

  • [2025.08.20] πŸš€ vLLM Support is Here! Our R-4B model is now fully compatible with vLLM for high-performance inference.
  • [2025.08.18] πŸ† Top Rank Achieved! We are thrilled to announce that R-4B is now ranked #1 among all open-source models on the OpenCompass Multi-modal Reasoning Leaderboard!
  • [2025.08.11] πŸ₯‡ Another #1! R-4B ranks first under 20B parameters on the OpenCompass Multi-modal Academic Leaderboard!
  • [2025.08.05] πŸŽ‰ R-4B is Released! Our model is now publicly available. You can download it from Hugging Face.

πŸ”₯ Quickstart

Below, we provide simple examples to show how to use R-4B with πŸ€— Transformers.

Using πŸ€— Transformers to Chat

Users can dynamically control the model's response by selecting one of three modes (auto-thinking, thinking, or non-thinking) with enable_thinking. enable_thinking=auto for auto-thinking mode; enable_thinking=long for thinking mode; enable_thinking=short for non-thinking mode;

import requests
import torch
from transformers import AutoModel, AutoProcessor
from PIL import Image

model_path = "YannQi/R-4B"

model = AutoModel.from_pretrained(
    model_path,
    torch_dtype=torch.float32,
    trust_remote_code=True,
).to("cuda")

processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "http://images.cocodataset.org/val2017/000000039769.jpg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, thinking_mode="auto")

image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)

inputs = processor(images=image, text=text, return_tensors="pt").to("cuda")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=16384)
output_ids = generated_ids[0][len(inputs.input_ids[0]) :]

# Decode output directly
output_text = processor.decode(
    output_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False,
)

print("Auto-Thinking Output:", output_text)

Using vLLM for fast R-4B deployment and inference.

  • We recommend using vLLM for fast R-4B deployment and inference.

Install

The code of R-4B requires the newest vllm now. Please install from local source:

git clone https://github.com/vllm-project/vllm.git
cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install --editable .
Online Serving

The thinking_mode switch is also available in APIs created by vLLM.

  • Serve
vllm serve \
    yannqi/R-4B \
    --served-model-name rvl \
    --tensor-parallel-size 8 \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8000 \
    --trust-remote-code
  • Openai Chat Completion Client
import base64
from PIL import Image
from openai import OpenAI


# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

# image url
image_messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "http://images.cocodataset.org/val2017/000000039769.jpg"
                },
            },
            {"type": "text", "text": "Describe this image."},
        ],
    },
]

chat_response = client.chat.completions.create(
    model="rvl",
    messages=image_messages,
)
print("Chat response:", chat_response)

πŸ“ˆ Experimental Results

R-4B Performance
  1. R-4B establishes itself with powerful, state-of-the-art perceptual abilities that are competitive with larger models.
  2. In evaluation sets that require complex logical reasoning and mathematical problem-solving, such as WeMath, MathVerse, and LogicVista, R-4B displays a strong performance curve. This highlights its advanced adaptive thinking capacity for logical deduction and solving complex quantitative problems.

βœ’οΈ Citation

Coming soon!

Acknowledgements

R-4B is developed based on the codebases of the following projects: LLaVA-Next, SigLIP2, Qwen3, Qwen2.5-VL, VLMEvalKit. We sincerely thank these projects for their outstanding work.

Downloads last month
7,943
Safetensors
Model size
4.82B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for YannQi/R-4B

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
(221)
this model