Prince-1/Osmosis-Mcp-Rkllm

Overview

Osmosis-MCP-4B is based on the Qwen3-4B model, fine-tuned with reinforcement learning to excel at multi step MCP-style tool usage.

We trained Osmosis-MCP-4B using a custom curriculum of multi-turn, tool-reliant prompts that mimic real-world use cases — for example:

"Given the weather in San Francisco, what are the top hiking locations?"

In addition, we provide a list of deterministic MCP like functions and mock server side behavior for the model to call and use.

This requires the model to reason through multiple tool invocations (e.g., weather → location ranker), and choose tools over intuition when applicable.

Training Approach

Our training pipeline leverages:

Dr. GRPO for stable and sample-efficient reinforcement learning.
Synthetic multi-step MCP interactions with strong tool chaining behavior, generated using our internal data engine.
SGLang + VeRL for efficient multi-turn rollout environments, built on top of Qwen3-4B for its function-calling capabilities.

Through this training methodology, we observed a notable behavioral shift: the model prefers invoking tools when appropriate, instead of relying solely on pre-trained intuition — a key milestone for MCP-native agents.

Why This Matters

MCP is fast becoming the open standard for tool-augmented AI agents. However:

Most top-performing models (e.g., Claude 3.7 Sonnet, Gemini 2.5 Pro) are closed.
Tool sprawl across clients and servers creates complexity.
Open models often lack the training to effectively use tools at all.

Prince-1
/

Osmosis-Mcp-Rkllm

Overview

Training Approach

Why This Matters

Model tree for Prince-1/Osmosis-Mcp-Rkllm