Update README.md
Browse files
README.md
CHANGED
@@ -2,12 +2,38 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
-
|
6 |
|
7 |
-
|
8 |
|
9 |
-
|
10 |
|
11 |
-
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
### Overview
|
6 |
|
7 |
+
Osmosis-MCP-4B is based on the Qwen3-4B model, fine-tuned with reinforcement learning to excel at multi step MCP-style tool usage.
|
8 |
|
9 |
+
We trained Osmosis-MCP-4B using a custom curriculum of **multi-turn, tool-reliant prompts** that mimic real-world use cases — for example:
|
10 |
|
11 |
+
> *"Given the weather in San Francisco, what are the top hiking locations?"*
|
12 |
|
13 |
+
This requires the model to reason through multiple tool invocations (e.g., weather → location ranker), and choose tools over intuition when applicable.
|
14 |
+
|
15 |
+
---
|
16 |
+
|
17 |
+
### Training Approach
|
18 |
+
|
19 |
+
Our training pipeline leverages:
|
20 |
+
|
21 |
+
- **Dr. GRPO** (a policy optimization algorithm developed by DeepSeek) for stable and sample-efficient reinforcement learning.
|
22 |
+
- **Synthetic multi-step MCP interactions** with strong tool chaining behavior, generated using our internal data engine.
|
23 |
+
- **SGLang + VeRL** for efficient multi-turn rollout environments, built on top of Qwen3-4B for its function-calling capabilities.
|
24 |
+
|
25 |
+
Through this training methodology, we observed a notable behavioral shift: the model **prefers invoking tools** when appropriate, instead of relying solely on pre-trained intuition — a key milestone for MCP-native agents.
|
26 |
+
|
27 |
+
---
|
28 |
+
|
29 |
+
### Why This Matters
|
30 |
+
|
31 |
+
MCP is fast becoming the **open standard for tool-augmented AI agents**. However:
|
32 |
+
|
33 |
+
- Most top-performing models (e.g., Claude 3.7 Sonnet, Gemini 2.5 Pro) are closed.
|
34 |
+
- Tool sprawl across clients and servers creates complexity.
|
35 |
+
- Open models often lack the training to effectively **use tools** at all.
|
36 |
+
|
37 |
+
Osmosis-MCP-4B addresses all three — it’s small, powerful, and practical.
|
38 |
+
|
39 |
+
---
|