AI & ML interests

Earth Observation Datasets

Recent Activity

Major-TOM's activity

fdaudens 
posted an update 2 days ago
view post
Post
222
This is the story of how open source AI created a $3M business for a news company:

Clare Spencer tells on the GAIN blog how a Danish software engineer found OpenAI's Whisper model and turned it into Good Tape. It's now generating $3M ARR for news service Zetland.

Great playbook on how to build a good product:
- This idea came from a software engineer, Jakob Steinn, who was not only able to spot a new model, but also listen to feedback from his colleagues in the newsrooms (he thought they would use it for translation, but they were more interested in transcription in Danish)
- They built iteratively: they went from running the model in the terminal to a notebook to a full-fledged web interface
- They didn't just wrap the API. They rebuilt the transcription engine from scratch, moved it to TPUs for 45-second processing of hour-long audio, and added EU-based data sovereignty

Now Good Tape has 2.5M users worldwide, with only 30-35% being journalists.
Small languages (Danish, Finnish, Croatian, Hebrew) were underserved by existing tools - suddenly there's a "very very big market" when you put them together.

This shows how open source AI can solve real workflow problems and create sustainable businesses. Sometimes the best opportunities emerge from solving your own daily problems.

Worth a read: https://generative-ai-newsroom.com/how-a-danish-news-service-made-a-profit-with-its-transcription-tool-285bc05b7cf9
prithivMLmods 
posted an update 6 days ago
view post
Post
4708
OpenAI, Google, Hugging Face, and Anthropic have released guides and courses on building agents, prompting techniques, scaling AI use cases, and more. Below are 10+ minimalistic guides and courses that may help you in your progress. 📖

⤷ Agents Companion : https://www.kaggle.com/whitepaper-agent-companion
⤷ Building Effective Agents : https://www.anthropic.com/engineering/building-effective-agents
⤷ Guide to building agents by OpenAI : https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
⤷ Prompt engineering by Google : https://www.kaggle.com/whitepaper-prompt-engineering
⤷ Google: 601 real-world gen AI use cases : https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders
⤷ Prompt engineering by IBM : https://www.ibm.com/think/topics/prompt-engineering-guide
⤷ Prompt Engineering by Anthropic : https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
⤷ Scaling AI use cases : https://cdn.openai.com/business-guides-and-resources/identifying-and-scaling-ai-use-cases.pdf
⤷ Prompting Guide 101 : https://services.google.com/fh/files/misc/gemini-for-google-workspace-prompting-guide-101.pdf
⤷ AI in the Enterprise by OpenAI : https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf

by HF🤗 :
⤷ AI Agents Course by Huggingface : https://huggingface.co/learn/agents-course/unit0/introduction
⤷ Smol-agents Docs : https://huggingface.co/docs/smolagents/en/tutorials/building_good_agents
⤷ MCP Course by Huggingface : https://huggingface.co/learn/mcp-course/unit0/introduction
⤷ Other Course (LLM, Computer Vision, Deep RL, Audio, Diffusion, Cookbooks, etc..) : https://huggingface.co/learn
  • 2 replies
·
prithivMLmods 
posted an update 7 days ago
view post
Post
2137
Just made a demo for Cosmos-Reason1, a physical AI model that understands physical common sense and generates appropriate embodied decisions in natural language through long chain-of-thought reasoning. Also added video understanding support to it. 🤗🚀

✦ Try the demo here : prithivMLmods/DocScope-R1

⤷ Cosmos-Reason1-7B : nvidia/Cosmos-Reason1-7B
⤷ docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
⤷ Captioner-Relaxed : Ertugrul/Qwen2.5-VL-7B-Captioner-Relaxed

⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

⤷ GitHub :
https://github.com/PRITHIVSAKTHIUR/Cosmos-x-DocScope
https://github.com/PRITHIVSAKTHIUR/Nvidia-Cosmos-Reason1-Demo.

To know more about it, visit the model card of the respective model. !!
clem 
posted an update 7 days ago
view post
Post
5310
Today, we're unveiling two new open-source AI robots! HopeJR for $3,000 & Reachy Mini for $300 🤖🤖🤖

Let's go open-source AI robotics!
·
fdaudens 
posted an update 8 days ago
view post
Post
2850
🎵 Dream come true for content creators! TIGER AI can extract voice, effects & music from ANY audio file 🤯
This lightweight model uses frequency band-split technology to separate speech like magic. Kudos to @fffiloni for the amazing demo! fffiloni/TIGER-audio-extraction
fdaudens 
posted an update 10 days ago
view post
Post
3759
Just completed the AI Agents course and wow, that capstone project really makes you understand how to build agents that can handle real-world complexity!

The final project uses the GAIA dataset - your agent has to solve tasks like analyzing Excel files, processing audio recordings, answering questions about YouTube videos, and diving into research papers. This isn't toy examples, it's the messy, multimodal stuff agents need to handle in practice.

Whether you’re just getting started with agents or want to go deeper with tools like LangChain, LlamaIndex, and SmolAgents, this course has tons of useful stuff. A few key insights:
- Code agents are incredibly versatile once you get the architecture right
- The sweet spot is finding the right balance of guidance vs autonomy for each use case
- Once the logic clicks, the possibilities really are endless - it's like letting LLMs break free from the chatbox

The course is free and the certification deadline is July 1st, 2025.

The Hugging Face team built something special here. If you're tired of AI that impresses in demos but fails in practice, this is your path to building agents that actually deliver. https://huggingface.co/learn/agents-course/unit0/introduction

Best part? There's the MCP course next!
clem 
posted an update 11 days ago
view post
Post
3213
It's just become easier to share your apps on the biggest AI app store (aka HF spaces) for unlimited storage, more visibility and community interactions.

Just pick a React, Svelte, or Vue template when you create your space or add app_build_command: npm run build in your README's YAML and app_file: build/index.html in your README's YAML block.

Or follow this link: https://huggingface.co/new-space?sdk=static

Let's build!
  • 1 reply
·
fdaudens 
posted an update 12 days ago
view post
Post
2515
Two lines in your terminal and you have an AI agent running whatever model and tools you want 🤯

Just tried the new Tiny Agents in Python. Asked it which team won the Italian Serie A soccer league and to export the final table to CSV. Coolest thing is you can interact with the agent, guide it, and correct its mistakes.

The agent connected to web browsing tools, searched for Serie A standings, identified the champion, and generated a CSV export.

The setup:
pip install "huggingface_hub[mcp]>=0.32.0"
tiny-agents run


That's it. The MCP protocol handles all the tool integrations automatically - no custom APIs to write, no complex setups. Want file system access? It's already there. Need web browsing? Built in.

You can swap models, change inference providers, run local models, or add new tools just by editing a simple JSON config. You can also use Gradio Spaces as MCP servers! The entire agent is ~70 lines of Python - essentially a while loop that streams responses and executes tools. Everything is open-source. ❤️ Hugging Face

Blog post: https://huggingface.co/blog/python-tiny-agents
  • 1 reply
·
fdaudens 
posted an update 13 days ago
view post
Post
2438
Here’s what happens when a national institution builds its own digital intelligence: France’s Ministry of Culture just released 17K+ real users testing 30+ chatbots in French. Raw, diverse, and a goldmine for studying LLMs in the wild.

ministere-culture/comparia-conversations
clem 
posted an update 15 days ago
view post
Post
3471
Playing with Veo3 this morning. Share your prompt if you want me to create videos for you (bonus point if they funnily reference HF/open-source). These videos are "a cat on the moon rapping "I love Hugging Face""!
·
Jofthomas 
posted an update 15 days ago
view post
Post
2729
Meet our new agentic model : 𝗗𝗲𝘃𝘀𝘁𝗿𝗮𝗹

Devstral is an open-source LLM built software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌.

𝗞𝗲𝘆 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 :
• 🤖 𝗔𝗴𝗲𝗻𝘁𝘀 : perfect for Agentic coding
• 🍃 𝗹𝗶𝗴𝗵𝘁𝘄𝗲𝗶𝗴𝗵𝘁: Devstral is a 𝟮𝟰𝗕 parameter based on Mistral small.
• ©️ 𝗔𝗽𝗮𝗰𝗵𝗲 𝟮.𝟬, meaning fully open-source !
• 📄 A 𝟭𝟮𝟴𝗸 context window.

📚Blog : https://mistral.ai/news/devstral
⚡API : The model is also available on our API under the name 𝗱𝗲𝘃𝘀𝘁𝗿𝗮𝗹-𝘀𝗺𝗮𝗹𝗹-𝟮𝟱𝟬𝟱
🤗 repo : mistralai/Devstral-Small-2505

Can't wait to see what you will build with it !
  • 1 reply
·
prithivMLmods 
posted an update 16 days ago
view post
Post
2281
Got access to Google's all-new Gemini Diffusion a state-of-the-art text diffusion model. It delivers the performance of Gemini 2.0 Flash-Lite at 5x the speed, generating over 1000 tokens in a fraction of a second and producing impressive results. Below are some initial outputs generated using the model. ♊🔥

Gemini Diffusion Playground ✦ : https://deepmind.google.com/frontiers/gemini-diffusion

Get Access Here : https://docs.google.com/forms/d/1aLm6J13tAkq4v4qwGR3z35W2qWy7mHiiA0wGEpecooo/viewform?edit_requested=true

🔗 To know more, visit: https://deepmind.google/models/gemini-diffusion/
  • 1 reply
·
prithivMLmods 
posted an update 17 days ago
view post
Post
2289
The more optimized explicit content filters with lightweight 𝙜𝙪𝙖𝙧𝙙 models trained based on siglip2 patch16 512 and vit patch16 224 for illustration and explicit content classification for content moderation in social media, forums, and parental controls for safer browsing environments. this version fixes the issues in the previous release, which lacked sufficient resources. 🚀

⤷ Models :
→ siglip2 mini explicit content : prithivMLmods/siglip2-mini-explicit-content [recommended]
→ vit mini explicit content : prithivMLmods/vit-mini-explicit-content

⤷ Building image safety-guard models : strangerguardhf

⤷ Datasets :
→ nsfw multidomain classification : strangerguardhf/NSFW-MultiDomain-Classification
→ nsfw multidomain classification v2.0 : strangerguardhf/NSFW-MultiDomain-Classification-v2.0

⤷ Collection :
→ Updated Versions [05192025] : prithivMLmods/explicit-content-filters-682aaa4733e378561925ca2b
→ Previous Versions : prithivMLmods/siglip2-content-filters-042025-final-680fe4aa1a9d589bf2c915ff

Find a collections inside the collection.👆

To know more about it, visit the model card of the respective model.
  • 1 reply
·
prithivMLmods 
posted an update 21 days ago
view post
Post
2696
Models for detecting images generated by diffusion models (Flux.1, SDXL, ..) are trained or fine-tuned using image classification models for content moderation. These models use datasets available on the Hub. For identifying AI-generated images or moderating visual content, the recommended model is OpenSDI-Flux.1-SigLIP2.😺🧨

Models : prithivMLmods/OpenSDI-Flux.1-SigLIP2 [Best approach for AI [Diffusion Generated] vs. real image classification] prithivMLmods/OpenSDI-SD2.1-SigLIP2 prithivMLmods/OpenSDI-SD3-SigLIP2 prithivMLmods/OpenSDI-SD1.5-SigLIP2 prithivMLmods/OpenSDI-SDXL-SigLIP2

Datasets : nebula/OpenSDI_test madebyollin/megalith-10m

Collection : prithivMLmods/opensdi-diffusion-generated-image-classification-682488a3a3e5be7083db3383

Find a collections inside the collection.👆

To know more about it, visit the model card of the respective model.
fdaudens 
posted an update 22 days ago
view post
Post
5122
Tried something new: an AI-generated podcast that breaks down the top research paper each day. Fully automated, now live on Spotify.

I built this prototype to help keep up with the rapid pace of AI developments and, hopefully, make cutting-edge research more accessible. I don’t know about you, but just listening to a conversation about a paper really helps the content sink in for me.

This build taught me a lot about full automation. If you’re into the technical weeds: Qwen3 runs on Inference to handle the script, Kokoro does the voice, and the whole thing gets published automatically thanks to the Hugging Face Jobs API and Gradio deployment.

It’s not perfect yet — I’ll be monitoring for hallucinations and incoherence. The voice model still needs polish, but it’s a promising start. Would love to build this with the community — submit a PR or send feedback. It’s just a beta of an experimental idea!

Big kudos to @m-ric , whose Open NotebookLM this is based on, and to @nielsr for his terrific work making research papers more accessible.

- Podcast on Spotify: https://open.spotify.com/show/3PTucIW1w1GIkqTYm32ka7?si=c7a851f83e6d4331 (Apple Podcasts soon)
- Code: fdaudens/podcast-jobs
- Open NotebookLM: m-ric/open-notebooklm
- Also super helpful, @qgallouedec 's tutorial on HF Jobs API: qgallouedec/run-hello-world
  • 1 reply
·
prithivMLmods 
posted an update 22 days ago
view post
Post
2026
Dropping some image classification models for content moderation and classifiers trained with datasets available on the Hub. All are fine-tuned on the siglip2 backbone, (competitions AIOrNot, Imagenette, and Driver-Drowsiness). Models and datasets are listed below:

🤗Models :
AI or Not : prithivMLmods/AIorNot-SigLIP2
Driver Drowsiness Detection : prithivMLmods/DOZE-GUARD-RLDD
Subset 10 ImageNet : prithivMLmods/IMAGENETTE

🥊Datasets :
+ competitions/aiornot
+ akahana/Driver-Drowsiness-Dataset
+ frgfm/imagenette

🔗Collection :
[The previous collection of models is also listed in the same collection, so you can find more models focused on image classification tasks.]

- prithivMLmods/multiclass-image-classification-05142025-68234c8010a9350a4d6739b5

Find a collections inside the collection.🤪👆

To know more about it, visit the model card of the respective model.
clem 
posted an update 23 days ago
view post
Post
3132
Very cool to see pytorch contributing on Hugging Face. Time to follow them to see what they're cooking!
  • 2 replies
·
fdaudens 
posted an update 24 days ago
view post
Post
787
Hey! I built an AI Agent to query the FOIA API for a workshop at the Hacks/Hackers Summit in Baltimore and you can do it too!

It’s a quick proof of concept to demo what agents can do, how to design workflows, and how to approach the coding side. TWant a fun project to learn how AI agents work? I built one that queries the FOIA API — and you can too!

It's a quick proof of concept I did for a workshop at the Hacks/Hackers Summit in Baltimore, demonstrating what agents can do, how to design workflows, and approaches to coding them.

- Slides https://docs.google.com/presentation/d/1lbf5K0yi213N7uxGnVKJdGWq2i0GayWj4vIcLkVlwD8/edit?usp=sharing
- Colab notebook https://colab.research.google.com/drive/1iw0qZyTni_6BcK0jj1x6gTfjm85NlaGv
- Gradio app: https://huggingface.co/spaces/JournalistsonHF/foia-agent
- MCP version to plug into Claude, Cursor, etc: https://huggingface.co/spaces/JournalistsonHF/foia-mcp-tools

Feel free to use the Gradio app for real FOIA requests, but also to improve it (I'm far from being a good coder) or adapt it for other countries.

And shout-out to everyone who powered through the workshop! 😅
  • 1 reply
·
prithivMLmods 
posted an update 26 days ago
view post
Post
3523
Dropping some image classification models for content moderation, balancers, and classifiers trained on synthetic datasets—along with others based on datasets available on the Hub. Also loaded a few low-rank datasets for realistic gender portrait classification and document-type classifiers, all fine-tuned on the SigLIP-2 Patch-16 224 backbone. Models and datasets are listed below:

🤗Models & Datasets :

Realistic Gender Classification : prithivMLmods/Realistic-Gender-Classification
prithivMLmods/Realistic-Portrait-Gender-1024px
Document Type Detection : prithivMLmods/Document-Type-Detection
prithivMLmods/Document-Type-Detection
Face Mask Detection : prithivMLmods/Face-Mask-Detection
DamarJati/Face-Mask-Detection
Alzheimer Stage Classifier : prithivMLmods/Alzheimer-Stage-Classifier
SilpaCS/Augmented_alzheimer
Bone Fracture Detection : prithivMLmods/Bone-Fracture-Detection
Hemg/bone-fracture-detection
GiD Land Cover Classification : prithivMLmods/GiD-Land-Cover-Classification
jonathan-roberts1/GID

🤗Collection : prithivMLmods/siglip2-05102025-681c2b0e406f0740a993fc1c

To know more about it, visit the model card of the respective model.