---
pipeline_tag: any-to-any
datasets:
- openbmb/RLAIF-V-Dataset
library_name: transformers
language:
- multilingual
tags:
- minicpm-o
- omni
- vision
- ocr
- multi-image
- video
- custom_code
- audio
- speech
- voice cloning
- live Streaming
- realtime speech conversation
- asr
- tts
---
A FINETUNE OF THE Minicpm omni, to make my new model named Samantha-Omni
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

```bib
@article{yao2024minicpm,
title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone},
author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
journal={arXiv preprint arXiv:2408.01800},
year={2024}
}
```