Edit Models filters

Apps

Docker Model Runner

Inference Providers

HF Inference API

Misc

visual-question-answering

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

574

Full-text search

Active filters: visual-question-answering

Salesforce/blip2-opt-2.7b

Image-Text-to-Text • 4B • Updated Feb 3 • 866k • 400

microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated May 1 • 273k • 1.45k

Salesforce/blip2-flan-t5-xl

Image-Text-to-Text • 4B • Updated Feb 3 • 187k • 81

Salesforce/blip2-flan-t5-xxl

Image-Text-to-Text • 12B • Updated Feb 3 • 7.05k • 92

google/pix2struct-ocrvqa-base

Visual Question Answering • Updated May 19, 2023 • 20 • 3

google/matcha-chartqa

Visual Question Answering • Updated Jul 22, 2023 • 655 • 47

google/cxr-foundation

Image Classification • Updated Feb 20 • 112 • 83

Foreshhh/Qwen2-VL-7B-SafeRLHF

Visual Question Answering • 8B • Updated Dec 22, 2024 • 891 • 3

omlab/VLM-R1-Qwen2.5VL-3B-Math-0305

Visual Question Answering • 4B • Updated Apr 14 • 1.62k • 6

remyxai/SpaceThinker-Qwen2.5VL-3B

Image-Text-to-Text • 4B • Updated about 1 month ago • 6.23k • 25

Lexius/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated Jun 2 • 3.13k • 1

dandelin/vilt-b32-finetuned-vqa

Visual Question Answering • Updated Aug 2, 2022 • 42.5k • 412

azwierzc/vilt-b32-finetuned-vqa-pl

Visual Question Answering • Updated Mar 21, 2022 • 12

Bingsu/temp_vilt_vqa

Visual Question Answering • Updated Nov 28, 2022 • 8

microsoft/git-base-vqav2

Visual Question Answering • 0.2B • Updated Mar 9, 2024 • 228 • 19

microsoft/git-base-textvqa

Visual Question Answering • 0.2B • Updated Mar 29, 2024 • 982 • 6

Salesforce/blip-vqa-base

Visual Question Answering • 0.4B • Updated Feb 3 • 392k • 165

Salesforce/blip-vqa-capfilt-large

Visual Question Answering • Updated Feb 3 • 75.7k • 52

tufa15nik/vilt-finetuned-vqasi

Visual Question Answering • Updated Dec 15, 2022 • 10

microsoft/git-large-vqav2

Visual Question Answering • 0.4B • Updated Sep 7, 2023 • 2.17k • 18

microsoft/git-large-textvqa

Visual Question Answering • 0.4B • Updated Apr 9, 2024 • 159 • 4

ivelin/donut-refexp-combined-v1

Visual Question Answering • Updated Feb 7, 2023 • 8 • 4

tifa-benchmark/promptcap-coco-vqa

Image-to-Text • Updated Dec 11, 2023 • 139 • 12

sheldonxxxx/OFA_model_weights

Visual Question Answering • Updated Feb 8, 2023 • 1

Salesforce/blip2-opt-6.7b

Image-Text-to-Text • 8B • Updated Feb 3 • 5.71k • 78

Salesforce/blip2-opt-2.7b-coco

Image-to-Text • 4B • Updated Feb 3 • 10.4k • 9

Salesforce/blip2-opt-6.7b-coco

Image-Text-to-Text • 8B • Updated Feb 3 • 78.7k • 34

Salesforce/blip2-flan-t5-xl-coco

Image-to-Text • 4B • Updated Feb 3 • 1.56k • 16

google/pix2struct-widget-captioning-large

Visual Question Answering • 1B • Updated Apr 10, 2024 • 51 • 19

google/pix2struct-ai2d-base

Visual Question Answering • 0.3B • Updated Dec 24, 2023 • 1.14k • 43