Safetensors
GGUF
qwen3
conversational

Training details of Osmosis-MCP-4B

#3
by xuuxu - opened

Thank you for your outstanding work and for open-sourcing the model. As a beginner who recently started exploring MCP and model fine-tuning, I've found your contributions incredibly helpful.

In your blog post, you mentioned benchmarking Osmosis-MCP-4B against Claude 3.7 Sonnet and GPT-4o on GSM8K, with results showing that Osmosis-MCP-4B achieves performance comparable to these leading foundation models. I have a few questions regarding your methodology:

Could you share more details about the training data used? Specifically, was GSM8K excluded from the training dataset?
You mentioned using "synthetic multi-step MCP interactions with strong tool chaining behavior, generated using our internal data engine." Would you be able to elaborate on what this internal data engine entails?

I'm particularly fascinated by your approach and would greatly appreciate any additional insights you could share.

Looking forward to your response.

Additionally, when I used the script you provided (https://huggingface.co/osmosis-ai/osmosis-mcp-4b/tree/tests/test/vllm) for testing, although the final results were all correct, the output content seemed to have some issues:
In the second test example, "Testing square root of 120," the model appeared to prefer outputting the answer directly without using the tool, and it kept generating output until it reached the token limit.
In the third test example, "Testing sequential chess moves," the model seemed to output a lot of irrelevant information, such as Python code, and it also kept generating output until it reached the token limit.
The continuous output also resulted in excessive time consumption.
log:
=== TEST RESULTS ===
βœ… SUCCESS - basic_chess_move (took 2.30s)
βœ… SUCCESS - square_root (took 29.55s)

  • Tested √120 (expected: ~10.954, matched: 10.954451150103322)
    βœ… SUCCESS - sequential_chess (took 211.94s)

=== SUMMARY ===
Total tests: 3
Successful: 3
Failed: 0

And I didn't modify any parameters in the script. Did you encounter similar issues during your testing?
Looking forward to your response.

Thank you for your outstanding work and for open-sourcing the model. As a beginner who recently started exploring MCP and model fine-tuning, I've found your contributions incredibly helpful.

In your blog post, you mentioned benchmarking Osmosis-MCP-4B against Claude 3.7 Sonnet and GPT-4o on GSM8K, with results showing that Osmosis-MCP-4B achieves performance comparable to these leading foundation models. I have a few questions regarding your methodology:

Could you share more details about the training data used? Specifically, was GSM8K excluded from the training dataset?
You mentioned using "synthetic multi-step MCP interactions with strong tool chaining behavior, generated using our internal data engine." Would you be able to elaborate on what this internal data engine entails?

I'm particularly fascinated by your approach and would greatly appreciate any additional insights you could share.

Looking forward to your response.

I also want to know the trainning details, is there any papers or open-source repos that show more details?

Sign up or log in to comment