--- pipeline_tag: any-to-any datasets: - openbmb/RLAIF-V-Dataset library_name: transformers language: - multilingual tags: - minicpm-o - omni - vision - ocr - multi-image - video - custom_code - audio - speech - voice cloning - live Streaming - realtime speech conversation - asr - tts ---

A FINETUNE OF THE Minicpm omni, to make my new model named Samantha-Omni

A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

![image/png](https://cdn-uploads.huggingface.co/production/uploads/638fd4be2ddd69e70b8cd31c/ZgfjZumhUeMp-Mfvfzdm1.png) ```bib @article{yao2024minicpm, title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone}, author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others}, journal={arXiv preprint arXiv:2408.01800}, year={2024} } ```