TIGER-Lab
/

VLM2Vec-LoRA

Text Generation

Model card Files Files and versions

ziyjiang commited on Jul 13

Commit

3161a2b

·

verified ·

1 Parent(s): 5813911

Update README.md

Files changed (1) hide show

README.md +10 -2

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ tags:
 # VLM2Vec
-This repo contains the model checkpoint for [VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks](https://arxiv.org/abs/2410.05160). In this paper, we aimed at building a unified multimodal embedding model for any tasks. Our model is based on converting an existing well-trained VLM (Phi-3.5-V) into an embedding model. The basic idea is to add an [EOS] token in the end of the sequence, which will be used as the representation of the multimodal inputs.
 ## Release
@@ -28,16 +28,24 @@ Our model is being trained on MMEB-train and evaluated on MMEB-eval with contras
 <details>
 <summary> V1 checkpoints </summary>
 - [VLM2Vec-Qwen2VL (7B)](https://huggingface.co/TIGER-Lab/VLM2Vec-Qwen2VL-7B)
 - [VLM2Vec-Qwen2VL (2B)](https://huggingface.co/TIGER-Lab/VLM2Vec-Qwen2VL-2B)
 - [VLM2Vec-LLaVa-Next](https://huggingface.co/TIGER-Lab/VLM2Vec-LLaVa-Next)
 - [VLM2Vec-Phi3.5V](https://huggingface.co/TIGER-Lab/VLM2Vec-Full)
 </details>
 ### Github
  - [Github](https://github.com/TIGER-AI-Lab/VLM2Vec)
-### Experimental Results
 Our model can outperform the existing baselines by a huge margin.
 <img width="900" alt="abs" src="vlm2vec_v1_result.png">

 # VLM2Vec
+This repo contains the VLM2Vec-Phi3.5V model checkpoint for [VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks](https://arxiv.org/abs/2410.05160). In this paper, we aimed at building a unified multimodal embedding model for any tasks. Our model is based on converting an existing well-trained VLM (Phi-3.5-V) into an embedding model. The basic idea is to add an [EOS] token in the end of the sequence, which will be used as the representation of the multimodal inputs.
 ## Release
 <details>
 <summary> V1 checkpoints </summary>
 - [VLM2Vec-Qwen2VL (7B)](https://huggingface.co/TIGER-Lab/VLM2Vec-Qwen2VL-7B)
 - [VLM2Vec-Qwen2VL (2B)](https://huggingface.co/TIGER-Lab/VLM2Vec-Qwen2VL-2B)
 - [VLM2Vec-LLaVa-Next](https://huggingface.co/TIGER-Lab/VLM2Vec-LLaVa-Next)
 - [VLM2Vec-Phi3.5V](https://huggingface.co/TIGER-Lab/VLM2Vec-Full)
 </details>
+<details>
+<summary> V2 checkpoints </summary>
+- [VLM2Vec-v2.0 (Qwen2VL-2B)](https://huggingface.co/VLM2Vec/VLM2Vec-V2.0)
+</details>
 ### Github
  - [Github](https://github.com/TIGER-AI-Lab/VLM2Vec)
+## Experimental Results
 Our model can outperform the existing baselines by a huge margin.
 <img width="900" alt="abs" src="vlm2vec_v1_result.png">