Image-Text-to-Text
Transformers
Safetensors
English
idefics3
multimodal
vision
conversational

This model can enabling video understanding and multi-image understanding capabilities?

#21
by xJohn - opened

Can this model support video understanding and multi-image understanding capabilities?

xJohn changed discussion status to closed
xJohn changed discussion status to open

Sign up or log in to comment