File size: 900 Bytes
0af29d1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8647f0c
0f1ca3c
0af29d1
c7a6d54
90887fc
0af29d1
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
pipeline_tag: any-to-any
datasets:
- openbmb/RLAIF-V-Dataset
library_name: transformers
language:
- multilingual
tags:
- minicpm-o
- omni
- vision
- ocr
- multi-image
- video
- custom_code
- audio
- speech
- voice cloning
- live Streaming
- realtime speech conversation
- asr
- tts
---

<h1> A FINETUNE OF THE Minicpm omni, to make my new model named Samantha-Omni</h1>
<h1>A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming</h1>

![image/png](https://cdn-uploads.huggingface.co/production/uploads/638fd4be2ddd69e70b8cd31c/ZgfjZumhUeMp-Mfvfzdm1.png)

```bib
@article{yao2024minicpm,
  title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone},
  author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
  journal={arXiv preprint arXiv:2408.01800},
  year={2024}
}
```