osmosis-ai
/

osmosis-mcp-4b

Model card Files Files and versions

osmosis-mcp-4b / README.md

AndyGulp's picture

Update README.md

0840430 verified 4 months ago

|

history blame contribute delete

1.74 kB

	---
	license: apache-2.0
	---

	### Overview

	Osmosis-MCP-4B is based on the Qwen3-4B model, fine-tuned with reinforcement learning to excel at multi step MCP-style tool usage.

	We trained Osmosis-MCP-4B using a custom curriculum of multi-turn, tool-reliant prompts that mimic real-world use cases — for example:

	> "Given the weather in San Francisco, what are the top hiking locations?"

	In addition, we provide a list of deterministic MCP like functions and mock server side behavior for the model to call and use.

	This requires the model to reason through multiple tool invocations (e.g., weather → location ranker), and choose tools over intuition when applicable.

	---

	### Training Approach

	Our training pipeline leverages:

	- [Dr. GRPO](https://arxiv.org/abs/2503.20783) for stable and sample-efficient reinforcement learning.
	- Synthetic multi-step MCP interactions with strong tool chaining behavior, generated using our internal data engine.
	- SGLang + VeRL for efficient multi-turn rollout environments, built on top of Qwen3-4B for its function-calling capabilities.

	Through this training methodology, we observed a notable behavioral shift: the model prefers invoking tools when appropriate, instead of relying solely on pre-trained intuition — a key milestone for MCP-native agents.

	---

	### Why This Matters

	MCP is fast becoming the open standard for tool-augmented AI agents. However:

	- Most top-performing models (e.g., Claude 3.7 Sonnet, Gemini 2.5 Pro) are closed.
	- Tool sprawl across clients and servers creates complexity.
	- Open models often lack the training to effectively use tools at all.

	Osmosis-MCP-4B addresses all three — it’s small, powerful, and practical.

	---