ziyjiang commited on
Commit
3161a2b
·
verified ·
1 Parent(s): 5813911

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -15,7 +15,7 @@ tags:
15
 
16
  # VLM2Vec
17
 
18
- This repo contains the model checkpoint for [VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks](https://arxiv.org/abs/2410.05160). In this paper, we aimed at building a unified multimodal embedding model for any tasks. Our model is based on converting an existing well-trained VLM (Phi-3.5-V) into an embedding model. The basic idea is to add an [EOS] token in the end of the sequence, which will be used as the representation of the multimodal inputs.
19
 
20
 
21
  ## Release
@@ -28,16 +28,24 @@ Our model is being trained on MMEB-train and evaluated on MMEB-eval with contras
28
 
29
  <details>
30
  <summary> V1 checkpoints </summary>
 
31
  - [VLM2Vec-Qwen2VL (7B)](https://huggingface.co/TIGER-Lab/VLM2Vec-Qwen2VL-7B)
32
  - [VLM2Vec-Qwen2VL (2B)](https://huggingface.co/TIGER-Lab/VLM2Vec-Qwen2VL-2B)
33
  - [VLM2Vec-LLaVa-Next](https://huggingface.co/TIGER-Lab/VLM2Vec-LLaVa-Next)
34
  - [VLM2Vec-Phi3.5V](https://huggingface.co/TIGER-Lab/VLM2Vec-Full)
35
  </details>
36
 
 
 
 
 
 
 
 
37
  ### Github
38
  - [Github](https://github.com/TIGER-AI-Lab/VLM2Vec)
39
 
40
- ### Experimental Results
41
  Our model can outperform the existing baselines by a huge margin.
42
  <img width="900" alt="abs" src="vlm2vec_v1_result.png">
43
 
 
15
 
16
  # VLM2Vec
17
 
18
+ This repo contains the VLM2Vec-Phi3.5V model checkpoint for [VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks](https://arxiv.org/abs/2410.05160). In this paper, we aimed at building a unified multimodal embedding model for any tasks. Our model is based on converting an existing well-trained VLM (Phi-3.5-V) into an embedding model. The basic idea is to add an [EOS] token in the end of the sequence, which will be used as the representation of the multimodal inputs.
19
 
20
 
21
  ## Release
 
28
 
29
  <details>
30
  <summary> V1 checkpoints </summary>
31
+
32
  - [VLM2Vec-Qwen2VL (7B)](https://huggingface.co/TIGER-Lab/VLM2Vec-Qwen2VL-7B)
33
  - [VLM2Vec-Qwen2VL (2B)](https://huggingface.co/TIGER-Lab/VLM2Vec-Qwen2VL-2B)
34
  - [VLM2Vec-LLaVa-Next](https://huggingface.co/TIGER-Lab/VLM2Vec-LLaVa-Next)
35
  - [VLM2Vec-Phi3.5V](https://huggingface.co/TIGER-Lab/VLM2Vec-Full)
36
  </details>
37
 
38
+ <details>
39
+ <summary> V2 checkpoints </summary>
40
+
41
+ - [VLM2Vec-v2.0 (Qwen2VL-2B)](https://huggingface.co/VLM2Vec/VLM2Vec-V2.0)
42
+ </details>
43
+
44
+
45
  ### Github
46
  - [Github](https://github.com/TIGER-AI-Lab/VLM2Vec)
47
 
48
+ ## Experimental Results
49
  Our model can outperform the existing baselines by a huge margin.
50
  <img width="900" alt="abs" src="vlm2vec_v1_result.png">
51