Audio-Text-to-Text
Transformers
Safetensors
qwen2_audio
text2text-generation
Inference Endpoints
franken commited on
Commit
4011eae
·
verified ·
1 Parent(s): 26fcbc8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -8,6 +8,10 @@ tags: []
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
 
 
 
 
 
11
 
12
 
13
  ## Inference
@@ -25,7 +29,7 @@ def _get_audio(wav_path):
25
  return audio
26
 
27
  model_name = "mispeech/r1-aqa"
28
- audio_url = "test-mini-audios/3fe64f3d-282c-4bc8-a753-68f8f6c35652.wav"
29
 
30
  processor = AutoProcessor.from_pretrained(model_name)
31
  model = Qwen2AudioForConditionalGeneration.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
 
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
 
11
+ ## Introduction
12
+
13
+ R1-AQA is based on `Qwen2-Audio-7B-Instruc`, but applied group relative policy optimization (GRPO) algorithm to the Audio Question Answering(AQA) task.
14
+ For more details, please refer to our [Github](https://github.com/xiaomi/r1-aqa) and [Report]().
15
 
16
 
17
  ## Inference
 
29
  return audio
30
 
31
  model_name = "mispeech/r1-aqa"
32
+ audio_url = "test-mini-audios/3fe64f3d-282c-4bc8-a753-68f8f6c35652.wav" # Copyied from MMAU dataset
33
 
34
  processor = AutoProcessor.from_pretrained(model_name)
35
  model = Qwen2AudioForConditionalGeneration.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")