mispeech
/

r1-aqa

@@ -10,7 +10,7 @@ tags: []
 ## Introduction
-R1-AQA is based on `Qwen2-Audio-7B-Instruc`, but applied group relative policy optimization (GRPO) algorithm to the Audio Question Answering(AQA) task.
 For more details, please refer to our [Github](https://github.com/xiaomi/r1-aqa) and [Report]().

 ## Introduction
+R1-AQA extends `Qwen2-Audio-7B-Instruc` by integrating group relative policy optimization (GRPO). This adaptation enhances the model's capacity for temporal reasoning and contextual alignment in audio question answering (AQA) tasks.
 For more details, please refer to our [Github](https://github.com/xiaomi/r1-aqa) and [Report]().