Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
prithivMLmodsΒ 
posted an update about 1 month ago
Post
4534
Try the Hugging Face Space demo for Logics-MLLM/Logics-Parsing, the latest multimodal VLM from the Logics Team at Alibaba Group. It enables end-to-end document parsing with precise content extraction in markdown format, and it also generates a clean HTML representation of the document while preserving its logical structure. πŸ€—πŸ”₯

Additionally, I’ve integrated one of my recent works β€” prithivMLmods/Gliese-OCR-7B-Post1.0 β€” which also excels at document comprehension.

⭐ Space / App : prithivMLmods/VLM-Parsing
πŸ“„ Technical Report by the Logics Team, Alibaba Group : Logics-Parsing Technical Report (2509.19760)
πŸ–– MM: VLM-Parsing: prithivMLmods/mm-vlm-parsing-68e33e52bfb9ae60b50602dc
⚑ Collections : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Other Pages:

βž” Multimodal VLMs - July'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
βž” Multimodal VLMs - Aug'25 : prithivMLmods/multimodal-vlms-aug25-68a56aac39fe8084f3c168bd
βž” VL caption β€” < Sep 15 ’25 : prithivMLmods/vl-caption-sep-15-25-68c7f6d737985c63c13e2391

.
.
.
To know more about it, visit the app page or the respective model page!!
In this post