Post
2003
Excited to bring the new models that are performing exceptionally well in document OCR, image captioning, and visual understanding tasks. Megalodon-OCR and Perseus-Doc-VL have both demonstrated significant improvements across key areas. You can explore live demos on Hugging Face Spaces to compare their performance with other top-tier models available on the hub. π€π
Spaces & Models :
> Doc-VLMs-OCR : prithivMLmods/Doc-VLMs-OCR
> core-OCR : prithivMLmods/core-OCR
> Megalodon-OCR (3B) : prithivMLmods/Megalodon-OCR-Sync-0713
> Perseus-Doc-vl (7B): prithivMLmods/Perseus-Doc-vl-0712
Datasets Caption Mix :
> Corvus-OCR-Caption-Mix : prithivMLmods/Corvus-OCR-Caption-Mix
> Corvus-OCR-Caption-Mini-Mix : prithivMLmods/Corvus-OCR-Caption-Mini-Mix
Collections :
> Corvus OCR Caption Mix: prithivMLmods/corvus-ocr-caption-mix-687349bfaceffbd10976f0cc
> Captioning / OCR / DocTable : prithivMLmods/captioning-ocr-doctable-687382e1da822008bb5c06f2
GitHub :
> OCR-ReportLab : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab/blob/main/Megalodon-OCR-Sync-0713-ColabNotebook/Megalodon_OCR_Sync_0713_ReportLab.ipynb
Others Spaces :
> Multimodal-OCR : prithivMLmods/Multimodal-OCR
> Multimodal-VLMs : prithivMLmods/Multimodal-VLMs
> Multimodal-OCR2 : prithivMLmods/Multimodal-OCR2
> Florence-2-Image-Caption : prithivMLmods/Florence-2-Image-Caption
> VisionScope-R2 : prithivMLmods/VisionScope-R2
> DocScope-R1 : prithivMLmods/DocScope-R1
.
.
.
To know more about it, visit the model card of the respective model. !!
Spaces & Models :
> Doc-VLMs-OCR : prithivMLmods/Doc-VLMs-OCR
> core-OCR : prithivMLmods/core-OCR
> Megalodon-OCR (3B) : prithivMLmods/Megalodon-OCR-Sync-0713
> Perseus-Doc-vl (7B): prithivMLmods/Perseus-Doc-vl-0712
Datasets Caption Mix :
> Corvus-OCR-Caption-Mix : prithivMLmods/Corvus-OCR-Caption-Mix
> Corvus-OCR-Caption-Mini-Mix : prithivMLmods/Corvus-OCR-Caption-Mini-Mix
Collections :
> Corvus OCR Caption Mix: prithivMLmods/corvus-ocr-caption-mix-687349bfaceffbd10976f0cc
> Captioning / OCR / DocTable : prithivMLmods/captioning-ocr-doctable-687382e1da822008bb5c06f2
GitHub :
> OCR-ReportLab : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab/blob/main/Megalodon-OCR-Sync-0713-ColabNotebook/Megalodon_OCR_Sync_0713_ReportLab.ipynb
Others Spaces :
> Multimodal-OCR : prithivMLmods/Multimodal-OCR
> Multimodal-VLMs : prithivMLmods/Multimodal-VLMs
> Multimodal-OCR2 : prithivMLmods/Multimodal-OCR2
> Florence-2-Image-Caption : prithivMLmods/Florence-2-Image-Caption
> VisionScope-R2 : prithivMLmods/VisionScope-R2
> DocScope-R1 : prithivMLmods/DocScope-R1
.
.
.
To know more about it, visit the model card of the respective model. !!