xiongwang commited on
Commit
6bd5dde
·
verified ·
1 Parent(s): edd4b81

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -74
README.md CHANGED
@@ -44,22 +44,68 @@ Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse moda
44
  ### Performance
45
 
46
  <details>
47
- <summary>Text -> Text</summary>
48
 
49
- | Dataset | Qwen2.5-Omni-7B | Qwen2.5-7B | Qwen2-7B | Llama3.1-8B | Gemma2-9B |
50
- |-----------------------------------|-----------|------------|----------|-------------|-----------|
51
- | MMLU-Pro | 47.0 | **56.3** | 44.1 | 48.3 | 52.1 |
52
- | MMLU-redux | 71.0 | **75.4** | 67.3 | 67.2 | 72.8 |
53
- | LiveBench<sub>0831</sub> | 29.6 | **35.9** | 29.2 | 26.7 | 30.6 |
54
- | GPQA | 30.8 | **36.4** | 34.3 | 32.8 | 32.8 |
55
- | MATH | 71.5 | **75.5** | 52.9 | 51.9 | 44.3 |
56
- | GSM8K | 88.7 | **91.6** | 85.7 | 84.5 | 76.7 |
57
- | HumanEval | 79.9 | **84.8** | 79.9 | 72.6 | 68.9 |
58
- | MBPP | 73.7 | **79.2** | 67.2 | 69.6 | 74.9 |
59
- | MultiPL-E | 67.0 | **70.4** | 59.1 | 50.7 | 53.4 |
60
- | LiveCodeBench<sub>2305-2409</sub> | 25.2 | **28.7** | 23.9 | 8.3 | 18.9 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  </details>
62
 
 
63
  <details>
64
  <summary>Audio -> Text</summary>
65
 
@@ -451,67 +497,6 @@ Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse moda
451
  | EgoSchema<sub>test</sub> | **69.6** | 63.2 | 65.0 | - |
452
  </details>
453
 
454
- <details>
455
- <summary>Multimodality -> Text</summary>
456
-
457
- <style type="text/css">
458
- .tg {border-collapse:collapse;border-spacing:0;}
459
- .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
460
- overflow:hidden;padding:10px 5px;word-break:normal;}
461
- .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
462
- font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
463
- .tg .tg-0lax{text-align:left;vertical-align:top}
464
- </style>
465
- <table class=""><thead>
466
- <tr>
467
- <th class="tg-0lax">Datasets</th>
468
- <th class="tg-0lax">Model</th>
469
- <th class="tg-0lax">Performance</th>
470
- </tr></thead>
471
- <tbody>
472
- <tr>
473
- <td class="tg-0lax" rowspan="10">OmniBench<br>Speech | Sound Event | Music | Avg</td>
474
- <td class="tg-0lax">Gemini-1.5-Pro</td>
475
- <td class="tg-0lax">42.67%|42.26%|46.23%|42.91%</td>
476
- </tr>
477
- <tr>
478
- <td class="tg-0lax">MIO-Instruct</td>
479
- <td class="tg-0lax">36.96%|33.58%|11.32%|33.80%</td>
480
- </tr>
481
- <tr>
482
- <td class="tg-0lax">AnyGPT (7B)</td>
483
- <td class="tg-0lax">17.77%|20.75%|13.21%|18.04%</td>
484
- </tr>
485
- <tr>
486
- <td class="tg-0lax">video-SALMONN</td>
487
- <td class="tg-0lax">34.11%|31.70%|<span style="font-weight:bold">56.60%</span>|35.64%</td>
488
- </tr>
489
- <tr>
490
- <td class="tg-0lax">UnifiedIO2-xlarge</td>
491
- <td class="tg-0lax">39.56%|36.98%|29.25%|38.00%</td>
492
- </tr>
493
- <tr>
494
- <td class="tg-0lax">UnifiedIO2-xxlarge</td>
495
- <td class="tg-0lax">34.24%|36.98%|29.25%|38.00%</td>
496
- </tr>
497
- <tr>
498
- <td class="tg-0lax">MiniCPM-o</td>
499
- <td class="tg-0lax">34.24%|36.98%|24.53%|33.98%</td>
500
- </tr>
501
- <tr>
502
- <td class="tg-0lax">Baichuan-Omni-1.5</td>
503
- <td class="tg-0lax">-|-|-|40.50%</td>
504
- </tr>
505
- <tr>
506
- <td class="tg-0lax">Qwen2-Audio</td>
507
- <td class="tg-0lax">-|-|-|42.90%</td>
508
- </tr>
509
- <tr>
510
- <td class="tg-0lax">Qwen2.5-Omni-7B</td>
511
- <td class="tg-0lax"><span style="font-weight:bold">55.25%</span>|<span style="font-weight:bold">60.00%</span>|52.83%|<span style="font-weight:bold">56.13%</span></td>
512
- </tr>
513
- </tbody></table>
514
- </details>
515
 
516
  <details>
517
  <summary>Zero-shot Speech Generation</summary>
@@ -615,6 +600,23 @@ Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse moda
615
  </tbody></table>
616
  </details>
617
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
618
  ## Quickstart
619
 
620
  Below, we provide simple examples to show how to use Qwen2.5-Omni with 🤗 Transformers. The codes of Qwen2.5-Omni on Hugging Face Transformers are in pull request stage and not merged into the main branch yet. Therefore, you may need to build from source to use it with command:
@@ -886,4 +888,21 @@ model = Qwen2_5OmniModel.from_pretrained(
886
  )
887
  ```
888
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
889
  <br>
 
44
  ### Performance
45
 
46
  <details>
47
+ <summary>Multimodality -> Text</summary>
48
 
49
+ <style type="text/css">
50
+ .tg {border-collapse:collapse;border-spacing:0;}
51
+ .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
52
+ overflow:hidden;padding:10px 5px;word-break:normal;}
53
+ .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
54
+ font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
55
+ .tg .tg-0lax{text-align:left;vertical-align:top}
56
+ </style>
57
+ <table class=""><thead>
58
+ <tr>
59
+ <th class="tg-0lax">Datasets</th>
60
+ <th class="tg-0lax">Model</th>
61
+ <th class="tg-0lax">Performance</th>
62
+ </tr></thead>
63
+ <tbody>
64
+ <tr>
65
+ <td class="tg-0lax" rowspan="10">OmniBench<br>Speech | Sound Event | Music | Avg</td>
66
+ <td class="tg-0lax">Gemini-1.5-Pro</td>
67
+ <td class="tg-0lax">42.67%|42.26%|46.23%|42.91%</td>
68
+ </tr>
69
+ <tr>
70
+ <td class="tg-0lax">MIO-Instruct</td>
71
+ <td class="tg-0lax">36.96%|33.58%|11.32%|33.80%</td>
72
+ </tr>
73
+ <tr>
74
+ <td class="tg-0lax">AnyGPT (7B)</td>
75
+ <td class="tg-0lax">17.77%|20.75%|13.21%|18.04%</td>
76
+ </tr>
77
+ <tr>
78
+ <td class="tg-0lax">video-SALMONN</td>
79
+ <td class="tg-0lax">34.11%|31.70%|<span style="font-weight:bold">56.60%</span>|35.64%</td>
80
+ </tr>
81
+ <tr>
82
+ <td class="tg-0lax">UnifiedIO2-xlarge</td>
83
+ <td class="tg-0lax">39.56%|36.98%|29.25%|38.00%</td>
84
+ </tr>
85
+ <tr>
86
+ <td class="tg-0lax">UnifiedIO2-xxlarge</td>
87
+ <td class="tg-0lax">34.24%|36.98%|29.25%|38.00%</td>
88
+ </tr>
89
+ <tr>
90
+ <td class="tg-0lax">MiniCPM-o</td>
91
+ <td class="tg-0lax">34.24%|36.98%|24.53%|33.98%</td>
92
+ </tr>
93
+ <tr>
94
+ <td class="tg-0lax">Baichuan-Omni-1.5</td>
95
+ <td class="tg-0lax">-|-|-|40.50%</td>
96
+ </tr>
97
+ <tr>
98
+ <td class="tg-0lax">Qwen2-Audio</td>
99
+ <td class="tg-0lax">-|-|-|42.90%</td>
100
+ </tr>
101
+ <tr>
102
+ <td class="tg-0lax">Qwen2.5-Omni-7B</td>
103
+ <td class="tg-0lax"><span style="font-weight:bold">55.25%</span>|<span style="font-weight:bold">60.00%</span>|52.83%|<span style="font-weight:bold">56.13%</span></td>
104
+ </tr>
105
+ </tbody></table>
106
  </details>
107
 
108
+
109
  <details>
110
  <summary>Audio -> Text</summary>
111
 
 
497
  | EgoSchema<sub>test</sub> | **69.6** | 63.2 | 65.0 | - |
498
  </details>
499
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
500
 
501
  <details>
502
  <summary>Zero-shot Speech Generation</summary>
 
600
  </tbody></table>
601
  </details>
602
 
603
+ <details>
604
+ <summary>Text -> Text</summary>
605
+
606
+ | Dataset | Qwen2.5-Omni-7B | Qwen2.5-7B | Qwen2-7B | Llama3.1-8B | Gemma2-9B |
607
+ |-----------------------------------|-----------|------------|----------|-------------|-----------|
608
+ | MMLU-Pro | 47.0 | **56.3** | 44.1 | 48.3 | 52.1 |
609
+ | MMLU-redux | 71.0 | **75.4** | 67.3 | 67.2 | 72.8 |
610
+ | LiveBench<sub>0831</sub> | 29.6 | **35.9** | 29.2 | 26.7 | 30.6 |
611
+ | GPQA | 30.8 | **36.4** | 34.3 | 32.8 | 32.8 |
612
+ | MATH | 71.5 | **75.5** | 52.9 | 51.9 | 44.3 |
613
+ | GSM8K | 88.7 | **91.6** | 85.7 | 84.5 | 76.7 |
614
+ | HumanEval | 79.9 | **84.8** | 79.9 | 72.6 | 68.9 |
615
+ | MBPP | 73.7 | **79.2** | 67.2 | 69.6 | 74.9 |
616
+ | MultiPL-E | 67.0 | **70.4** | 59.1 | 50.7 | 53.4 |
617
+ | LiveCodeBench<sub>2305-2409</sub> | 25.2 | **28.7** | 23.9 | 8.3 | 18.9 |
618
+ </details>
619
+
620
  ## Quickstart
621
 
622
  Below, we provide simple examples to show how to use Qwen2.5-Omni with 🤗 Transformers. The codes of Qwen2.5-Omni on Hugging Face Transformers are in pull request stage and not merged into the main branch yet. Therefore, you may need to build from source to use it with command:
 
888
  )
889
  ```
890
 
891
+
892
+ <!-- ## Citation
893
+
894
+ If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil: :)
895
+
896
+
897
+
898
+ ```BibTeX
899
+
900
+ @article{Qwen2.5-Omni,
901
+ title={Qwen2.5-Omni Technical Report},
902
+ author={},
903
+ journal={arXiv preprint arXiv:2502.13923},
904
+ year={2025}
905
+ }
906
+ ``` -->
907
+
908
  <br>