Prithiv Sakthi's picture

Prithiv Sakthi

prithivMLmods

·

https://linktr.ee/prithivsakthi

AI & ML interests

computer vision, nlp, multimodality @strangerzonehf @strangerguardhf

Recent Activity

new activity about 3 hours ago

zjunlp/OceanGPT-o-7B:fix ~ pipeline tag

new activity about 4 hours ago

prithivMLmods/Multimodal-OCR2:why run time error is

updated a Space about 15 hours ago

prithivMLmods/Multimodal-VLMs-5x

View all activity

Organizations

New activity in zjunlp/OceanGPT-o-7B about 3 hours ago

fix ~ pipeline tag

#1 opened about 3 hours ago by

New activity in prithivMLmods/Multimodal-OCR2 about 4 hours ago

why run time error is

#4 opened 11 days ago by

updated a Space about 15 hours ago

Multimodal VLMs 5x

vision matters / ovr / vigal 7b / vlm r1 math / visionary r

upvoted 2 papers about 15 hours ago

Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published 4 days ago • 18

Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback

Paper • 2507.02321 • Published 4 days ago • 38

published a Space about 17 hours ago

Multimodal VLMs 5x

vision matters / ovr / vigal 7b / vlm r1 math / visionary r

reacted to their post with 🤗 about 17 hours ago

Post

2860

Variable Demo for Two Image-Text-to-Text Multimodals 🌠

📜Space: prithivMLmods/Multimodal-OCR

By default, it will use:
prithivMLmods/Qwen2-VL-OCR-2B-Instruct or
prithivMLmods/Qwen2-VL-OCR2-2B-Instruct

To trigger Aya-Vision's 8B by @aya-vision , use the prompt:
https://huggingface.co/CohereForAI/aya-vision-8b

reacted to their post with ➕ about 17 hours ago

Post

1986

Luna, the single-speaker text-to-speech model, features a Radio & Atcosim-style sound with a female voice. It offers authentic radio podcast noise and empathetic speech generation, fine-tuned based on Orpheus's Llama-based speech generation state-of-the-art model. 🎙️

+ Model : prithivMLmods/Llama-3B-Mono-Luna
+ Collection : prithivMLmods/clean-radio-mono-voice-67e76fe1b3a87cc3bccef803
+ Reference ft : https://github.com/canopyai/Orpheus-TTS
+ Base Model : canopylabs/orpheus-3b-0.1-ft

I also tried some other clean-voice single-speaker models based on Orpheus. If you're interested, check out the collection.

🔉Try the Mono Luna demo here: http://colab.research.google.com/drive/1K0AAIOKDE5XE0znxXaiiUJvPSpFveteK

6 replies

·

reacted to their post with 😔 about 17 hours ago

Post

2758

Models for detecting images generated by diffusion models (Flux.1, SDXL, ..) are trained or fine-tuned using image classification models for content moderation. These models use datasets available on the Hub. For identifying AI-generated images or moderating visual content, the recommended model is OpenSDI-Flux.1-SigLIP2.😺🧨

Models : prithivMLmods/OpenSDI-Flux.1-SigLIP2 [Best approach for AI [Diffusion Generated] vs. real image classification] prithivMLmods/OpenSDI-SD2.1-SigLIP2 prithivMLmods/OpenSDI-SD3-SigLIP2 prithivMLmods/OpenSDI-SD1.5-SigLIP2 prithivMLmods/OpenSDI-SDXL-SigLIP2

Datasets : nebula/OpenSDI_test madebyollin/megalith-10m

Collection : prithivMLmods/opensdi-diffusion-generated-image-classification-682488a3a3e5be7083db3383

Find a collections inside the collection.👆

To know more about it, visit the model card of the respective model.

reacted to their post with ➕ about 17 hours ago

Post

3817

The demo for the MonkeyOCR Recognition model, which adopts a Structure-Recognition-Relation (SRR) triplet paradigm & Nanonets-OCR-s a powerful, state-of-the-art image-to-markdown OCR model that goes far beyond traditional text extraction and other experimental document OCR models, is combined into a single space.

✦ Try the demo here : prithivMLmods/core-OCR
✦ Try Nanonets-OCR-s demo here : prithivMLmods/Multimodal-OCR

⤷ MonkeyOCR Recognition : echo840/MonkeyOCR
⤷ docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
⤷ coreOCR-7B-050325-preview : prithivMLmods/coreOCR-7B-050325-preview
⤷ Nanonets-OCR-s : nanonets/Nanonets-OCR-s

⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Also, include a sample OCR test using the VisionOCR-3B-061125 model and the Qwen2-VL-OCR-2B-Instruct model.
⤷ Blog : https://huggingface.co/blog/prithivMLmods/visionocr-3b-061125-vs-qwen2-vl-ocr-2b-instruct

To know more about it, visit the model card of the respective model. !!

updated a model 1 day ago

prithivMLmods/Graphic-Class

0.1B • Updated 1 day ago • 13

published a model 1 day ago

prithivMLmods/Graphic-Class

0.1B • Updated 1 day ago • 13

upvoted 3 papers 2 days ago

Energy-Based Transformers are Scalable Learners and Thinkers

Paper • 2507.02092 • Published 4 days ago • 24

WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published 4 days ago • 79

HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation

Paper • 2506.21546 • Published 11 days ago • 2

reacted to their post with 🔥❤️ 2 days ago

Post

3093

Multimodal OCR with ReportLab? On Colab T4? (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B?) .. Yeah, it’s possible. I’ve made a dedicated Colab notebook to experiment with these models (all built on top of Qwen2.5 VL). 🤗🚀

Download notebooks here :

✦︎ NanonetsOCR : https://colab.research.google.com/drive/1VvA-amvSVxGdWgIsh4_by6KWOtEs_Iqp
✦︎ MonkeyOCR : https://colab.research.google.com/drive/1vPCojbmlXjDFUt06FJ1tjgnj_zWK4mUo
✦︎ OCRFluxOCR : https://colab.research.google.com/drive/1TDoCXzWdF2hxVLbISqW6DjXAzOyI7pzf
✦︎ TyphoonOCR : https://colab.research.google.com/drive/1_59zvLNnn1kvbiSFxzA1WiqhpbW8RKbz

🜲 Github : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab

What does it do?

1. Performs OCR on the input image
2. Generates a DOCX or PDF file with the input image and the extracted text

.
.
.
To know more about it, visit the model card of the respective model. !!

liked a model 2 days ago

skt/A.X-4.0

Text Generation • 72B • Updated 3 days ago • 1.41k • 120

updated 2 Spaces 3 days ago

OCR

nanonets / qwen2vl ocr / rolmocr / aya vision

Doc VLMs V2 Localization

camel-doc-ocr / vilasr-7b / ocrflux-3b / shotvl-7b