|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
### Overview |
|
|
|
Osmosis-MCP-4B is based on the Qwen3-4B model, fine-tuned with reinforcement learning to excel at multi step MCP-style tool usage. |
|
|
|
We trained Osmosis-MCP-4B using a custom curriculum of **multi-turn, tool-reliant prompts** that mimic real-world use cases — for example: |
|
|
|
> *"Given the weather in San Francisco, what are the top hiking locations?"* |
|
|
|
In addition, we provide a list of deterministic MCP like functions and mock server side behavior for the model to call and use. |
|
|
|
This requires the model to reason through multiple tool invocations (e.g., weather → location ranker), and choose tools over intuition when applicable. |
|
|
|
--- |
|
|
|
### Training Approach |
|
|
|
Our training pipeline leverages: |
|
|
|
- [**Dr. GRPO**](https://arxiv.org/abs/2503.20783) for stable and sample-efficient reinforcement learning. |
|
- **Synthetic multi-step MCP interactions** with strong tool chaining behavior, generated using our internal data engine. |
|
- **SGLang + VeRL** for efficient multi-turn rollout environments, built on top of Qwen3-4B for its function-calling capabilities. |
|
|
|
Through this training methodology, we observed a notable behavioral shift: the model **prefers invoking tools** when appropriate, instead of relying solely on pre-trained intuition — a key milestone for MCP-native agents. |
|
|
|
--- |
|
|
|
### Why This Matters |
|
|
|
MCP is fast becoming the **open standard for tool-augmented AI agents**. However: |
|
|
|
- Most top-performing models (e.g., Claude 3.7 Sonnet, Gemini 2.5 Pro) are closed. |
|
- Tool sprawl across clients and servers creates complexity. |
|
- Open models often lack the training to effectively **use tools** at all. |
|
|
|
Osmosis-MCP-4B addresses all three — it’s small, powerful, and practical. |
|
|
|
--- |