Describe images, videos, and audio
LightGlue demo
EfficientVLM
Search and filter CVPR 2025 papers
Compare vision language models
Gemma 3 for license plate detection
A more robust benchmark for long video understanding.
A space for nanoVLM model
Detect objects in images from URL or upload