OpenGVLab
/

Mini-InternVL-Chat-2B-V1-5

@@ -5,6 +5,7 @@ library_name: transformers
 base_model:
   - OpenGVLab/InternViT-300M-448px
   - internlm/internlm2-chat-1_8b
 base_model_relation: merge
 language:
   - multilingual
@@ -19,7 +20,7 @@ tags:
 # Mini-InternVL-Chat-2B-V1-5
-[\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL)  [\[🆕 Blog\]](https://internvl.github.io/blog/)  [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238)  [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[📜 Mini-InternVL Report\]](https://arxiv.org/abs/2410.16261)
 [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)  [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL)  [\[🚀 Quick Start\]](#quick-start)  [\[📖 中文解读\]](https://zhuanlan.zhihu.com/p/706547971)  [\[📖 Documents\]](https://internvl.readthedocs.io/en/latest/)
@@ -69,7 +70,7 @@ We provide an example code to run Mini-InternVL-Chat-2B-V1-5 using `transformers
 We also welcome you to experience the InternVL2 series models in our [online demo](https://internvl.opengvlab.com/).
-> Please use transformers==4.37.2 to ensure the model works normally.
 ### Model Loading
@@ -379,7 +380,7 @@ response, history = model.chat(tokenizer, pixel_values, question, generation_con
 print(f'User: {question}\nAssistant: {response}')
 ```
-#### Streaming output
 Besides this method, you can also use the following code to get streamed output.
@@ -419,12 +420,12 @@ Many repositories now support fine-tuning of the InternVL series models, includi
 LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
 ```sh
-pip install lmdeploy==0.5.3
 ```
 LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline.
-#### A 'Hello, world' example
 ```python
 from lmdeploy import pipeline, TurbomindEngineConfig
@@ -439,7 +440,7 @@ print(response.text)
 If `ImportError` occurs while executing this case, please install the required dependency packages as prompted.
-#### Multi-images inference
 When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the context window typically needs to be increased.
@@ -464,7 +465,7 @@ response = pipe((f'Image-1: {IMAGE_TOKEN}\nImage-2: {IMAGE_TOKEN}\ndescribe thes
 print(response.text)
 ```
-#### Batch prompts inference
 Conducting inference with batch prompts is quite straightforward; just place them within a list structure:
@@ -484,7 +485,7 @@ response = pipe(prompts)
 print(response)
 ```
-#### Multi-turn conversation
 There are two ways to do the multi-turn conversations with the pipeline. One is to construct messages according to the format of OpenAI and use above introduced method, the other is to use the `pipeline.chat` interface.

 base_model:
   - OpenGVLab/InternViT-300M-448px
   - internlm/internlm2-chat-1_8b
+new_version: OpenGVLab/InternVL2_5-2B
 base_model_relation: merge
 language:
   - multilingual
 # Mini-InternVL-Chat-2B-V1-5
+[\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL)  [\[🆕 Blog\]](https://internvl.github.io/blog/)  [\[📜 InternVL 1.0\]](https://arxiv.org/abs/2312.14238)  [\[📜 InternVL 1.5\]](https://arxiv.org/abs/2404.16821) [\[📜 Mini-InternVL\]](https://arxiv.org/abs/2410.16261)
 [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)  [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL)  [\[🚀 Quick Start\]](#quick-start)  [\[📖 中文解读\]](https://zhuanlan.zhihu.com/p/706547971)  [\[📖 Documents\]](https://internvl.readthedocs.io/en/latest/)
 We also welcome you to experience the InternVL2 series models in our [online demo](https://internvl.opengvlab.com/).
+> Please use transformers>=4.37.2 to ensure the model works normally.
 ### Model Loading
 print(f'User: {question}\nAssistant: {response}')
 ```
+#### Streaming Output
 Besides this method, you can also use the following code to get streamed output.
 LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
 ```sh
+pip install lmdeploy>=0.5.3
 ```
 LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline.
+#### A 'Hello, world' Example
 ```python
 from lmdeploy import pipeline, TurbomindEngineConfig
 If `ImportError` occurs while executing this case, please install the required dependency packages as prompted.
+#### Multi-images Inference
 When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the context window typically needs to be increased.
 print(response.text)
 ```
+#### Batch Prompts Inference
 Conducting inference with batch prompts is quite straightforward; just place them within a list structure:
 print(response)
 ```
+#### Multi-turn Conversation
 There are two ways to do the multi-turn conversations with the pipeline. One is to construct messages according to the format of OpenAI and use above introduced method, the other is to use the `pipeline.chat` interface.

configuration_internvl_chat.py CHANGED Viewed

@@ -39,11 +39,11 @@ class InternVLChatConfig(PretrainedConfig):
         super().__init__(**kwargs)
         if vision_config is None:
-            vision_config = {}
             logger.info('vision_config is None. Initializing the InternVisionConfig with default values.')
         if llm_config is None:
-            llm_config = {}
             logger.info('llm_config is None. Initializing the LlamaConfig config with default values (`LlamaConfig`).')
         self.vision_config = InternVisionConfig(**vision_config)

         super().__init__(**kwargs)
         if vision_config is None:
+            vision_config = {'architectures': ['InternVisionModel']}
             logger.info('vision_config is None. Initializing the InternVisionConfig with default values.')
         if llm_config is None:
+            llm_config = {'architectures': ['InternLM2ForCausalLM']}
             logger.info('llm_config is None. Initializing the LlamaConfig config with default values (`LlamaConfig`).')
         self.vision_config = InternVisionConfig(**vision_config)

modeling_intern_vit.py CHANGED Viewed

@@ -3,6 +3,7 @@
 # Copyright (c) 2024 OpenGVLab
 # Licensed under The MIT License [see LICENSE for details]
 # --------------------------------------------------------
 from typing import Optional, Tuple, Union
 import torch

 # Copyright (c) 2024 OpenGVLab
 # Licensed under The MIT License [see LICENSE for details]
 # --------------------------------------------------------
 from typing import Optional, Tuple, Union
 import torch