TIGER-Lab
/

VLM2Vec-LLaVa-Next

Image-Text-to-Text

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ziyjiang commited on Dec 19, 2024

Commit

0748cfb

·

verified ·

1 Parent(s): f759c6b

Update README.md

Files changed (1) hide show

README.md +0 -68

README.md CHANGED Viewed

@@ -33,75 +33,7 @@ VLM2Vec-LlaVa-Next could outperform the baselines and other version of VLM2Vec b
 ## How to use VLM2Vec-LlaVa-Next
-First you can clone our github
-```bash
-git clone https://github.com/TIGER-AI-Lab/VLM2Vec.git
-```
-Then you can enter the directory to run the following command.
-```python
-from src.model import MMEBModel
-from src.arguments import ModelArguments
-from src.utils import load_processor
-import torch
-from transformers import HfArgumentParser, AutoProcessor
-from PIL import Image
-import numpy as np
-model_args = ModelArguments(
-    model_name='TIGER-Lab/VLM2Vec-LLaVa-Next',
-    pooling='last',
-    normalize=True,
-    model_backbone='llava')
-model = MMEBModel.load(model_args)
-model.eval()
-model = model.to('cuda', dtype=torch.bfloat16)
-processor = load_processor(model_args)
-# Image + Text -> Text
-inputs = processor('<image> Represent the given image with the following question: What is in the image', [Image.open('figures/example.jpg')])
-inputs = {key: value.to('cuda') for key, value in inputs.items()}
-qry_output = model(qry=inputs)["qry_reps"]
-string = 'A cat and a dog'
-inputs = processor(string)
-inputs = {key: value.to('cuda') for key, value in inputs.items()}
-tgt_output = model(tgt=inputs)["tgt_reps"]
-print(string, '=', model.compute_similarity(qry_output, tgt_output))
-## A cat and a dog = tensor([[0.2969]], device='cuda:0', dtype=torch.bfloat16)
-string = 'A cat and a tiger'
-inputs = processor(string)
-inputs = {key: value.to('cuda') for key, value in inputs.items()}
-tgt_output = model(tgt=inputs)["tgt_reps"]
-print(string, '=', model.compute_similarity(qry_output, tgt_output))
-## A cat and a tiger = tensor([[0.2080]], device='cuda:0', dtype=torch.bfloat16)
-# Text -> Image
-inputs = processor('Find me an everyday image that matches the given caption: A cat and a dog.',)
-inputs = {key: value.to('cuda') for key, value in inputs.items()}
-qry_output = model(qry=inputs)["qry_reps"]
-string = '<image> Represent the given image.'
-inputs = processor(string, [Image.open('figures/example.jpg')])
-inputs = {key: value.to('cuda') for key, value in inputs.items()}
-tgt_output = model(tgt=inputs)["tgt_reps"]
-print(string, '=', model.compute_similarity(qry_output, tgt_output))
-## <|image_1|> Represent the given image. = tensor([[0.3105]], device='cuda:0', dtype=torch.bfloat16)
-inputs = processor('Find me an everyday image that matches the given caption: A cat and a tiger.',)
-inputs = {key: value.to('cuda') for key, value in inputs.items()}
-qry_output = model(qry=inputs)["qry_reps"]
-string = '<image> Represent the given image.'
-inputs = processor(string, [Image.open('figures/example.jpg')])
-inputs = {key: value.to('cuda') for key, value in inputs.items()}
-tgt_output = model(tgt=inputs)["tgt_reps"]
-print(string, '=', model.compute_similarity(qry_output, tgt_output))
-## <|image_1|> Represent the given image. = tensor([[0.2158]], device='cuda:0', dtype=torch.bfloat16)
 ```
 ## Citation

 ## How to use VLM2Vec-LlaVa-Next
 ```
 ## Citation