Post
47
Try out the demo for Multimodal OCR featuring the implementation of models including
๐คMultimodal OCR Space : prithivMLmods/Multimodal-OCR
๐ฆThe models implemented in this Space are:
+ RolmOCR : reducto/RolmOCR
+ Qwen2VL OCR : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
+ Qwen2VL OCR2 : prithivMLmods/Qwen2-VL-OCR2-2B-Instruct
RolmOCR
and Qwen2VL OCR
. The use case showcases image-to-text-to-text conversion and video understanding support for the RolmOCR model ! ๐๐คMultimodal OCR Space : prithivMLmods/Multimodal-OCR
๐ฆThe models implemented in this Space are:
+ RolmOCR : reducto/RolmOCR
+ Qwen2VL OCR : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
[ or ]
+ Qwen2VL OCR2 : prithivMLmods/Qwen2-VL-OCR2-2B-Instruct
Qwen2VL OCR supports only image-text-to-text in the space.