A FINETUNE OF THE Minicpm omni, to make my new model named Samantha-Omni

---
pipeline_tag: any-to-any
datasets:
- openbmb/RLAIF-V-Dataset
library_name: transformers
language:
- multilingual
tags:
- minicpm-o
- omni
- vision
- ocr
- multi-image
- video
- custom_code
- audio
- speech
- voice cloning
- live Streaming
- realtime speech conversation
- asr
- tts
---

<h1> A FINETUNE OF THE Minicpm omni, to make my new model named Samantha-Omni</h1>
<h1>A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming</h1>

![image/png](https://cdn-uploads.huggingface.co/production/uploads/638fd4be2ddd69e70b8cd31c/ZgfjZumhUeMp-Mfvfzdm1.png)

```bib
@article{yao2024minicpm,
  title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone},
  author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
  journal={arXiv preprint arXiv:2408.01800},
  year={2024}
}
```