AI & ML interests

A central place for all AI creators wanting to use the different AI Tools that provides HuggingFace for their film creations

Recent Activity

Zaesar  updated a collection 1 day ago
Image
Zaesar  updated a Space 1 day ago
AIFILMS/README
Zaesar  updated a collection 2 days ago
Speech
View all activity

Zaesar 
updated a Space 1 day ago
prithivMLmods 
posted an update 3 days ago
view post
Post
1679
The demo for DREX-062225-exp (Document Retrieval and Extraction eXpert ~ experimental) / typhoon-ocr-3b (a bilingual document parsing model built specifically for real-world documents) / VIREX-062225-exp (Video Information Retrieval and Extraction eXpert ~ experimental) / olmOCR-7B-0225-preview (the document parsing model based on Qwen2VL). 🤗

✦ Demo : prithivMLmods/Doc-VLMs-OCR ~ ( with .md canvas )

⤷ DREX-062225-exp : prithivMLmods/DREX-062225-exp
⤷ typhoon-ocr-3b : scb10x/typhoon-ocr-3b
⤷ VIREX-062225-exp : prithivMLmods/VIREX-062225-exp
⤷ olmOCR-7B-0225-preview : allenai/olmOCR-7B-0225-preview

⤷ Collection : prithivMLmods/doc-vl-685839064a863e1cd23be3f1
⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
.
.
.

To know more about it, visit the model card of the respective model. !!
·
prithivMLmods 
posted an update 4 days ago
view post
Post
2596
Updated the docscopeOCR-7B-050425-exp with the DREX-062225-exp, with improved preciseness in table structure and line spacing in the markdown used on the document page. And though this is still an experimental one, it's expected to perform well in the defined DREX use cases [ Document Retrieval and Extraction eXpert – experimental ocr ]. 💻

⤷ Model : prithivMLmods/DREX-062225-exp
⤷ Demo : prithivMLmods/Doc-VLMs-OCR

⤷ Collection : prithivMLmods/doc-vl-685839064a863e1cd23be3f1
⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
⤷ Git : https://github.com/PRITHIVSAKTHIUR/DREX.git
.
.
.

To know more about it, visit the model card of the respective model. !!
prithivMLmods 
posted an update 8 days ago
view post
Post
1773
The demo for smoldocling / nanonets ocr / typhoon ocr / monkey ocr explores the document OCR capabilities of various newly released multimodal VLMs in a single space. And if you're experiencing or demoing long document image OCR, kindly use the Smoldocling 256M preview [ Smoldocling is back in demo here. ] 🤗.

✦ Try the demo here : prithivMLmods/Multimodal-OCR2

⤷ MonkeyOCR Recognition : echo840/MonkeyOCR
⤷ Nanonets-OCR-s : nanonets/Nanonets-OCR-s
⤷ SmolDocling-256M-preview : ds4sd/SmolDocling-256M-preview
⤷ typhoon-ocr-7b : scb10x/typhoon-ocr-7b

⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

⤷ Github : https://github.com/PRITHIVSAKTHIUR/Multimodal-OCR2


The community GPU grant was given by Hugging Face — special thanks to them. 🤗🚀



To know more about it, visit the model card of the respective model. !!
  • 2 replies
·
multimodalart 
posted an update 9 days ago
view post
Post
3906
Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it 🐐

I've built a live real time demo on Spaces 📹💨

multimodalart/self-forcing
  • 2 replies
·
prithivMLmods 
posted an update 10 days ago
view post
Post
3737
The demo for the MonkeyOCR Recognition model, which adopts a Structure-Recognition-Relation (SRR) triplet paradigm & Nanonets-OCR-s a powerful, state-of-the-art image-to-markdown OCR model that goes far beyond traditional text extraction and other experimental document OCR models, is combined into a single space.

✦ Try the demo here : prithivMLmods/core-OCR
✦ Try Nanonets-OCR-s demo here : prithivMLmods/Multimodal-OCR

⤷ MonkeyOCR Recognition : echo840/MonkeyOCR
⤷ docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
⤷ coreOCR-7B-050325-preview : prithivMLmods/coreOCR-7B-050325-preview
⤷ Nanonets-OCR-s : nanonets/Nanonets-OCR-s

⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Also, include a sample OCR test using the VisionOCR-3B-061125 model and the Qwen2-VL-OCR-2B-Instruct model.
⤷ Blog : https://huggingface.co/blog/prithivMLmods/visionocr-3b-061125-vs-qwen2-vl-ocr-2b-instruct

To know more about it, visit the model card of the respective model. !!
reach-vb 
posted an update 15 days ago
view post
Post
2208
Excited to onboard FeatherlessAI on Hugging Face as an Inference Provider - they bring a fleet of 6,700+ LLMs on-demand on the Hugging Face Hub 🤯

Starting today, you'd be able to access all those LLMs (OpenAI compatible) on HF model pages and via OpenAI client libraries too! 💥

Go, play with it today: https://huggingface.co/blog/inference-providers-featherless

P.S. They're also bringing on more GPUs to support all your concurrent requests!
prithivMLmods 
posted an update 28 days ago
view post
Post
5673
OpenAI, Google, Hugging Face, and Anthropic have released guides and courses on building agents, prompting techniques, scaling AI use cases, and more. Below are 10+ minimalistic guides and courses that may help you in your progress. 📖

⤷ Agents Companion : https://www.kaggle.com/whitepaper-agent-companion
⤷ Building Effective Agents : https://www.anthropic.com/engineering/building-effective-agents
⤷ Guide to building agents by OpenAI : https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
⤷ Prompt engineering by Google : https://www.kaggle.com/whitepaper-prompt-engineering
⤷ Google: 601 real-world gen AI use cases : https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders
⤷ Prompt engineering by IBM : https://www.ibm.com/think/topics/prompt-engineering-guide
⤷ Prompt Engineering by Anthropic : https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
⤷ Scaling AI use cases : https://cdn.openai.com/business-guides-and-resources/identifying-and-scaling-ai-use-cases.pdf
⤷ Prompting Guide 101 : https://services.google.com/fh/files/misc/gemini-for-google-workspace-prompting-guide-101.pdf
⤷ AI in the Enterprise by OpenAI : https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf

by HF🤗 :
⤷ AI Agents Course by Huggingface : https://huggingface.co/learn/agents-course/unit0/introduction
⤷ Smol-agents Docs : https://huggingface.co/docs/smolagents/en/tutorials/building_good_agents
⤷ MCP Course by Huggingface : https://huggingface.co/learn/mcp-course/unit0/introduction
⤷ Other Course (LLM, Computer Vision, Deep RL, Audio, Diffusion, Cookbooks, etc..) : https://huggingface.co/learn
  • 2 replies
·
prithivMLmods 
posted an update 29 days ago
view post
Post
2290
Just made a demo for Cosmos-Reason1, a physical AI model that understands physical common sense and generates appropriate embodied decisions in natural language through long chain-of-thought reasoning. Also added video understanding support to it. 🤗🚀

✦ Try the demo here : prithivMLmods/DocScope-R1

⤷ Cosmos-Reason1-7B : nvidia/Cosmos-Reason1-7B
⤷ docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
⤷ Captioner-Relaxed : Ertugrul/Qwen2.5-VL-7B-Captioner-Relaxed

⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

⤷ GitHub :
https://github.com/PRITHIVSAKTHIUR/Cosmos-x-DocScope
https://github.com/PRITHIVSAKTHIUR/Nvidia-Cosmos-Reason1-Demo.

To know more about it, visit the model card of the respective model. !!