Joy Caption Pre Alpha
Generate captions for images
Generate captions for images
Segment objects in images and videos using text prompts
Generate descriptions by uploading images or videos
Generate insights from charts using text prompts
Generate descriptions for images using text prompts
Upload an image to detect objects
Extract text and metadata from PDF files
Try PaliGemma on document understanding tasks
Generate image descriptions
Chat with an AI that understands images and text
Ask questions about images and get detailed answers
GPT 4o like bot.
Analyze and extract text from documents
Generate detailed descriptions from images and videos
Generate retrieval queries from document images
Microsoft Phi-3 Vision 128k with Multimodal capabilities
A Fully Open Multilingual Multimodal LLM for 39 Languages
Demo for DocLayout-YOLO
A data extraction tool to convert PDF to Markdown and JSON
Extract text from images
Huggingface space for JanusFlow-1.3B
Upload documents for Q&A
Generate clickable coordinates on a screenshot
PaliGemma2 LoRA finetuned on VQAv2
Gaze detection using Moondream
Detect and annotate human poses in images and videos
olmocr / nanonets ocr / qwen2vl ocr / rolmocr / aya-vision
Extract text from images and PDFs
OmniParser, turn your LLM into GUI agent
See, read, and reasonβbetter together.
Generate text or segment objects from an image
Interact with the Aya family of models.
interact with videos !
Classify images in real-time using your webcam
OCR for PDFs and Images using Mistral OCR
Upload image to detect objects
Object Detection & Scene Understanding for Images and Video
Describe masked areas in images
Object Detection on Images and Video
Start camera and receive responses based on video feed
Seed1.5-VL API Demo
Demo for Nanonets-OCR
Chat with images, videos, or PDFs to generate text
THUDM/GLM-4.1V-9B-Thinking Demo