File size: 1,043 Bytes
19f6271 9d2fe78 f28daaf 9d2fe78 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
---
language:
- en
tags:
- image-to-text
---
## lokibots/vit-patch16-1280-gpt2-large-image-summary
This model generates a summary from a given chart image. The model accepts an image of size 1280x768 (or less) and generates a summary describing the contents of the image. **However, training is still required.**
## sample inference code
```{python}
from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, GPT2Tokenizer
from PIL import Image
model = VisionEncoderDecoderModel.from_pretrained("lokibots/vit-patch16-1280-gpt2-large-image-summary")
feature_extractor = ViTFeatureExtractor.from_pretrained("lokibots/vit-patch16-1280-gpt2-large-image-summary")
tokenizer = GPT2Tokenizer.from_pretrained('gpt2-large')
image = Image.open("image_file").convert("RGB")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
gen_kwargs = {"max_length": 1024, "num_beams": 4}
output_ids = model.generate(pixel_values, **gen_kwargs)
preds = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
``` |