AI & ML interests

Traditional Chinese language models

Recent Activity

prithivMLmods 
posted an update about 23 hours ago
view post
Post
700
Dropping the general-purpose reasoning dataset Poseidon-Reasoning-5M, which supports general thought processes, math, and science — featuring a diverse mixture of domains 🌊 : prithivMLmods/Poseidon-Reasoning-5M

from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Poseidon-Reasoning-5M", split="data")

The compact version is as follows — Poseidon-Reasoning-Mini-300K : prithivMLmods/Poseidon-Reasoning-Mini-300K


from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Poseidon-Reasoning-Mini-300K", split="train")


Collection : prithivMLmods/poseidon-reasoning-6879ca98e118b307c781a9ba
prithivMLmods 
posted an update 5 days ago
view post
Post
2095
Open Omega Ω (Forge, Atom, Explora):
A Fusion of Math, Science, and Coding 🧪🤗

Datasets :
⌯⌲ Open-Omega-Forge-1M [Mathematics, Coding, and Science]: prithivMLmods/Open-Omega-Forge-1M
⌯⌲ Open-Omega-Atom-1.5M [Mathematics and Science]: prithivMLmods/Open-Omega-Atom-1.5M
⌯⌲ Open-Omega-Explora-2.5M [Forge + Atom]: prithivMLmods/Open-Omega-Explora-2.5M
⌯⌲ Others [Subordinate portion] - Curated and blended modular dataset.

Models :
> Omega-Qwen3-Atom-8B : prithivMLmods/Omega-Qwen3-Atom-8B
> Omega-Qwen2.5-Coder-3B : prithivMLmods/Omega-Qwen2.5-Coder-3B

Dataset Collection: prithivMLmods/open-omega-a-fusion-of-math-science-and-coding-68756c37769fa39c4055cc0e

.
.
.
For more information, refer to the dataset card(s).

prithivMLmods 
posted an update 6 days ago
view post
Post
3753
Excited to bring the new models that are performing exceptionally well in document OCR, image captioning, and visual understanding tasks. Megalodon-OCR and Perseus-Doc-VL have both demonstrated significant improvements across key areas. You can explore live demos on Hugging Face Spaces to compare their performance with other top-tier models available on the hub. 🤗📄

Models & Spaces :
> Megalodon-OCR (3B) : prithivMLmods/Megalodon-OCR-Sync-0713
> Perseus-Doc-vl (7B): prithivMLmods/Perseus-Doc-vl-0712
> Doc-VLMs-OCR : prithivMLmods/Doc-VLMs-OCR
> core-OCR : prithivMLmods/core-OCR


Datasets Caption Mix :
> Corvus-OCR-Caption-Mix : prithivMLmods/Corvus-OCR-Caption-Mix
> Corvus-OCR-Caption-Mini-Mix : prithivMLmods/Corvus-OCR-Caption-Mini-Mix

Collections :
> Corvus OCR Caption Mix: prithivMLmods/corvus-ocr-caption-mix-687349bfaceffbd10976f0cc
> Captioning / OCR / DocTable : prithivMLmods/captioning-ocr-doctable-687382e1da822008bb5c06f2

GitHub :
> OCR-ReportLab : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab/blob/main/Megalodon-OCR-Sync-0713-ColabNotebook/Megalodon_OCR_Sync_0713_ReportLab.ipynb

Others Spaces :
> Multimodal-OCR : prithivMLmods/Multimodal-OCR
> Multimodal-VLMs : prithivMLmods/Multimodal-VLMs
> Multimodal-OCR2 : prithivMLmods/Multimodal-OCR2
> Florence-2-Image-Caption : prithivMLmods/Florence-2-Image-Caption
> VisionScope-R2 : prithivMLmods/VisionScope-R2
> DocScope-R1 : prithivMLmods/DocScope-R1

.
.
.
To know more about it, visit the model card of the respective model. !!
prithivMLmods 
posted an update 10 days ago
view post
Post
2344
Demo of OCR & Math QA using multi-capable VLMs like MonkeyOCR-pro-1.2B, R1-One-Vision, VisionaryR1, Vision Matters-7B, and VIGAL-7B, all running together with support for both image and video inference. 🪐

✦ Demo Spaces :
⤷ Multimodal VLMs : prithivMLmods/Multimodal-VLMs

✦ Models :
⤷ Visionary R1 : maifoundations/Visionary-R1
⤷ MonkeyOCR [1.2B] : echo840/MonkeyOCR-pro-1.2B
⤷ ViGaL 7B : yunfeixie/ViGaL-7B
⤷ Lh41-1042-Magellanic-7B-0711 : prithivMLmods/Lh41-1042-Magellanic-7B-0711
⤷ Vision Matters 7B : Yuting6/Vision-Matters-7B
⤷ WR30a-Deep-7B-0711 : prithivMLmods/WR30a-Deep-7B-0711

✦ MonkeyOCR-pro-1.2B Colab T4 Demo [ notebook ]
⤷ MonkeyOCR-pro-1.2B-ReportLab : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab/blob/main/MonkeyOCR-0709/MonkeyOCR-pro-1.2B-ReportLab.ipynb

✦ GitHub : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab

The community GPU grant was given by Hugging Face — special thanks to them.🤗🚀

.
.
.
To know more about it, visit the model card of the respective model. !!
prithivMLmods 
posted an update 17 days ago
view post
Post
3517
Multimodal OCR with ReportLab? On Colab T4? (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B?) .. Yeah, it’s possible. I’ve made a dedicated Colab notebook to experiment with these models (all built on top of Qwen2.5 VL). 🤗🚀

Download notebooks here :

✦︎ NanonetsOCR : https://colab.research.google.com/drive/1VvA-amvSVxGdWgIsh4_by6KWOtEs_Iqp
✦︎ MonkeyOCR : https://colab.research.google.com/drive/1vPCojbmlXjDFUt06FJ1tjgnj_zWK4mUo
✦︎ OCRFluxOCR : https://colab.research.google.com/drive/1TDoCXzWdF2hxVLbISqW6DjXAzOyI7pzf
✦︎ TyphoonOCR : https://colab.research.google.com/drive/1_59zvLNnn1kvbiSFxzA1WiqhpbW8RKbz

🜲 Github : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab

What does it do?

1. Performs OCR on the input image
2. Generates a DOCX or PDF file with the input image and the extracted text

.
.
.
To know more about it, visit the model card of the respective model. !!
prithivMLmods 
posted an update 18 days ago
view post
Post
1664
The bunch of comparable demos for Multimodal VLMs (excels in OCR, cinematography understanding, spatial reasoning, etc.) now up on the Hub 🤗 — max recent till Jun'25.

✦ Demo Spaces —

> [Nanonets-OCR-s, MonkeyOCR, Typhoon-OCR-7B, SmolDocling] : prithivMLmods/Multimodal-OCR2
> [GLM-4.1v, docscopeOCR-7B, MonkeyOCR, coreOCR-7B] : prithivMLmods/core-OCR
> [Camel-Doc-OCR, ViLaSR-7B, OCRFlux-3B, ShotVL-7B] : prithivMLmods/Doc-VLMs-v2-Localization
> [SkyCaptioner-V1, SpaceThinker-3B, coreOCR-7B, SpaceOm-3B] : prithivMLmods/VisionScope-R2
> [RolmOCR-7B, Qwen2-VL-OCR-2B, Aya-Vision-8B, Nanonets-OCR-s] : prithivMLmods/Multimodal-OCR
> [DREX-062225-7B, Typhoon-OCR-3B, olmOCR-7B-0225, VIREX-062225-7B] : prithivMLmods/Doc-VLMs-OCR
> [Cosmos-Reason1-7B, docscopeOCR-7B, Captioner-7B, visionOCR-3B] : prithivMLmods/DocScope-R1

✦ Space Collection : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!
  • 1 reply
·
prithivMLmods 
posted an update 20 days ago
view post
Post
2421
The demo for Camel-Doc-OCR-062825 (exp) is optimized for document retrieval and direct Markdown (.md) generation from images and PDFs. Additional demos include OCRFlux-3B (document OCR), VilaSR (spatial reasoning with visual drawing), and ShotVL (cinematic language understanding). 🐪

✦ Space : prithivMLmods/Doc-VLMs-v2-Localization

Models :
⤷ camel-doc-ocr-062825 : prithivMLmods/Camel-Doc-OCR-062825
⤷ ocrflux-3b : ChatDOC/OCRFlux-3B
⤷ vilasr : AntResearchNLP/ViLaSR
⤷ shotvl : Vchitect/ShotVL-7B

⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

The community GPU grant was given by Hugging Face — special thanks to them. This space supports the following tasks: (image inference, video inference) with result markdown canvas and object detection/localization. 🤗🚀

.
.
.
To know more about it, visit the model card of the respective model. !!
prithivMLmods 
posted an update 25 days ago
view post
Post
1981
The demo for DREX-062225-exp (Document Retrieval and Extraction eXpert ~ experimental) / typhoon-ocr-3b (a bilingual document parsing model built specifically for real-world documents) / VIREX-062225-exp (Video Information Retrieval and Extraction eXpert ~ experimental) / olmOCR-7B-0225-preview (the document parsing model based on Qwen2VL). 🤗

✦ Demo : prithivMLmods/Doc-VLMs-OCR ~ ( with .md canvas )

⤷ DREX-062225-exp : prithivMLmods/DREX-062225-exp
⤷ typhoon-ocr-3b : scb10x/typhoon-ocr-3b
⤷ VIREX-062225-exp : prithivMLmods/VIREX-062225-exp
⤷ olmOCR-7B-0225-preview : allenai/olmOCR-7B-0225-preview

⤷ Collection : prithivMLmods/doc-vl-685839064a863e1cd23be3f1
⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
.
.
.

To know more about it, visit the model card of the respective model. !!
·
prithivMLmods 
posted an update 26 days ago
view post
Post
2698
Updated the docscopeOCR-7B-050425-exp with the DREX-062225-exp, with improved preciseness in table structure and line spacing in the markdown used on the document page. And though this is still an experimental one, it's expected to perform well in the defined DREX use cases [ Document Retrieval and Extraction eXpert – experimental ocr ]. 💻

⤷ Model : prithivMLmods/DREX-062225-exp
⤷ Demo : prithivMLmods/Doc-VLMs-OCR

⤷ Collection : prithivMLmods/doc-vl-685839064a863e1cd23be3f1
⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
⤷ Git : https://github.com/PRITHIVSAKTHIUR/DREX.git
.
.
.

To know more about it, visit the model card of the respective model. !!
prithivMLmods 
posted an update 30 days ago
view post
Post
1918
The demo for smoldocling / nanonets ocr / typhoon ocr / monkey ocr explores the document OCR capabilities of various newly released multimodal VLMs in a single space. And if you're experiencing or demoing long document image OCR, kindly use the Smoldocling 256M preview [ Smoldocling is back in demo here. ] 🤗.

✦ Try the demo here : prithivMLmods/Multimodal-OCR2

⤷ MonkeyOCR Recognition : echo840/MonkeyOCR
⤷ Nanonets-OCR-s : nanonets/Nanonets-OCR-s
⤷ SmolDocling-256M-preview : ds4sd/SmolDocling-256M-preview
⤷ typhoon-ocr-7b : scb10x/typhoon-ocr-7b

⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

⤷ Github : https://github.com/PRITHIVSAKTHIUR/Multimodal-OCR2


The community GPU grant was given by Hugging Face — special thanks to them. 🤗🚀



To know more about it, visit the model card of the respective model. !!
  • 2 replies
·
prithivMLmods 
posted an update about 1 month ago
view post
Post
3897
The demo for the MonkeyOCR Recognition model, which adopts a Structure-Recognition-Relation (SRR) triplet paradigm & Nanonets-OCR-s a powerful, state-of-the-art image-to-markdown OCR model that goes far beyond traditional text extraction and other experimental document OCR models, is combined into a single space.

✦ Try the demo here : prithivMLmods/core-OCR
✦ Try Nanonets-OCR-s demo here : prithivMLmods/Multimodal-OCR

⤷ MonkeyOCR Recognition : echo840/MonkeyOCR
⤷ docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
⤷ coreOCR-7B-050325-preview : prithivMLmods/coreOCR-7B-050325-preview
⤷ Nanonets-OCR-s : nanonets/Nanonets-OCR-s

⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Also, include a sample OCR test using the VisionOCR-3B-061125 model and the Qwen2-VL-OCR-2B-Instruct model.
⤷ Blog : https://huggingface.co/blog/prithivMLmods/visionocr-3b-061125-vs-qwen2-vl-ocr-2b-instruct

To know more about it, visit the model card of the respective model. !!
prithivMLmods 
posted an update about 2 months ago
view post
Post
5750
OpenAI, Google, Hugging Face, and Anthropic have released guides and courses on building agents, prompting techniques, scaling AI use cases, and more. Below are 10+ minimalistic guides and courses that may help you in your progress. 📖

⤷ Agents Companion : https://www.kaggle.com/whitepaper-agent-companion
⤷ Building Effective Agents : https://www.anthropic.com/engineering/building-effective-agents
⤷ Guide to building agents by OpenAI : https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
⤷ Prompt engineering by Google : https://www.kaggle.com/whitepaper-prompt-engineering
⤷ Google: 601 real-world gen AI use cases : https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders
⤷ Prompt engineering by IBM : https://www.ibm.com/think/topics/prompt-engineering-guide
⤷ Prompt Engineering by Anthropic : https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
⤷ Scaling AI use cases : https://cdn.openai.com/business-guides-and-resources/identifying-and-scaling-ai-use-cases.pdf
⤷ Prompting Guide 101 : https://services.google.com/fh/files/misc/gemini-for-google-workspace-prompting-guide-101.pdf
⤷ AI in the Enterprise by OpenAI : https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf

by HF🤗 :
⤷ AI Agents Course by Huggingface : https://huggingface.co/learn/agents-course/unit0/introduction
⤷ Smol-agents Docs : https://huggingface.co/docs/smolagents/en/tutorials/building_good_agents
⤷ MCP Course by Huggingface : https://huggingface.co/learn/mcp-course/unit0/introduction
⤷ Other Course (LLM, Computer Vision, Deep RL, Audio, Diffusion, Cookbooks, etc..) : https://huggingface.co/learn
  • 2 replies
·
prithivMLmods 
posted an update about 2 months ago
view post
Post
2332
Just made a demo for Cosmos-Reason1, a physical AI model that understands physical common sense and generates appropriate embodied decisions in natural language through long chain-of-thought reasoning. Also added video understanding support to it. 🤗🚀

✦ Try the demo here : prithivMLmods/DocScope-R1

⤷ Cosmos-Reason1-7B : nvidia/Cosmos-Reason1-7B
⤷ docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
⤷ Captioner-Relaxed : Ertugrul/Qwen2.5-VL-7B-Captioner-Relaxed

⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

⤷ GitHub :
https://github.com/PRITHIVSAKTHIUR/Cosmos-x-DocScope
https://github.com/PRITHIVSAKTHIUR/Nvidia-Cosmos-Reason1-Demo.

To know more about it, visit the model card of the respective model. !!