
ds4sd/SmolDocling-256M-preview
Image-Text-to-Text
β’
0.3B
β’
Updated
β’
147k
β’
1.5k
Conversational speech generation
SOTA real-time object detection model
Identify and segment objects in images using text, visual, or prompt-free prompts
Convert images and text into structured documents
Generate video from images by simulating 3D camera control
MultiImages-to-3D Generation