{ "bomFormat": "CycloneDX", "specVersion": "1.6", "serialNumber": "urn:uuid:1d723570-c4c0-4b5d-b1a1-aac2d99d4b2d", "version": 1, "metadata": { "timestamp": "2025-06-05T09:41:47.231012+00:00", "component": { "type": "machine-learning-model", "bom-ref": "openbmb/MiniCPM-Llama3-V-2_5-0f9a184d-7b24-5845-b52e-cbdc16daa953", "name": "openbmb/MiniCPM-Llama3-V-2_5", "externalReferences": [ { "url": "https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5", "type": "documentation" } ], "modelCard": { "modelParameters": { "task": "image-text-to-text", "architectureFamily": "minicpmv", "modelArchitecture": "MiniCPMV", "datasets": [ { "ref": "openbmb/RLAIF-V-Dataset-d1ff380b-52a0-586f-946e-773a0ef8556f" } ] }, "properties": [ { "name": "library_name", "value": "transformers" } ] }, "authors": [ { "name": "openbmb" } ], "description": "**MiniCPM-Llama3-V 2.5** is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include:- \ud83d\udd25 **Leading Performance.**MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max** and greatly outperforms other Llama 3-based MLLMs.- \ud83d\udcaa **Strong OCR Capabilities.**MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving an **700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro**. Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences.- \ud83c\udfc6 **Trustworthy Behavior.**Leveraging the latest [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) method (the newest technology in the [RLHF-V](https://github.com/RLHF-V) [CVPR'24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves **10.3%** hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. [Data released](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset).- \ud83c\udf0f **Multilingual Support.**Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from [VisCPM](https://github.com/OpenBMB/VisCPM), MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to **over 30 languages including German, French, Spanish, Italian, Korean, Japanese etc.** [All Supported Languages](./assets/minicpm-llama-v-2-5_languages.md).- \ud83d\ude80 **Efficient Deployment.**MiniCPM-Llama3-V 2.5 systematically employs **model quantization, CPU optimizations, NPU optimizations and compilation optimizations**, achieving high-efficiency deployment on edge devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a **150-fold acceleration in multimodal large model end-side image encoding** and a **3-fold increase in language decoding speed**.- \ud83d\udcab **Easy Usage.**MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) and [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) support for efficient CPU inference on local devices, (2) [GGUF](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) format quantized models in 16 sizes, (3) efficient [LoRA](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#lora-finetuning) fine-tuning with only 2 V100 GPUs, (4) [streaming output](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage), (5) quick local WebUI demo setup with [Gradio](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_2.5.py) and [Streamlit](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_streamlit-2_5.py), and (6) interactive demos on [HuggingFace Spaces](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5).", "tags": [ "transformers", "safetensors", "minicpmv", "feature-extraction", "minicpm-v", "vision", "ocr", "custom_code", "image-text-to-text", "conversational", "multilingual", "dataset:openbmb/RLAIF-V-Dataset", "region:us" ] } }, "components": [ { "type": "data", "bom-ref": "openbmb/RLAIF-V-Dataset-d1ff380b-52a0-586f-946e-773a0ef8556f", "name": "openbmb/RLAIF-V-Dataset", "data": [ { "type": "dataset", "bom-ref": "openbmb/RLAIF-V-Dataset-d1ff380b-52a0-586f-946e-773a0ef8556f", "name": "openbmb/RLAIF-V-Dataset", "contents": { "url": "https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset", "properties": [ { "name": "task_categories", "value": "visual-question-answering" }, { "name": "language", "value": "en" }, { "name": "size_categories", "value": "10K