metadata
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-3B
tags:
- code
- humaneval
- multi-agent
- mlgrpo
- qwen2.5
library_name: transformers
pipeline_tag: text-generation
2xQwen2.5-Coder-3B-Satyr-Aux
This model is a fine-tuned version of Qwen/Qwen2.5-Coder-3B using Multi-LLM Group Relative Policy Optimization (MAGRPO) on HumanEval dataset.