Dataset Viewer for PDFs just landed on Hugging Face 📖🤗 you can now preview all the PDFs easier than before!
on top of this, there's PdfFolder format to load the PDF datasets quicker 💨 > to use it, your dataset should follow a directory format like folder/train/doc1.pdf, folder/train/doc1.pdf > if you want to include bounding boxes, labels etc. you can keep them in a metadata.csv file in the same folder 🤝
Hugging Face just wrapped 4 months of deep work with AMD to push kernel-level optimization on their MI300X GPUs. Now, it's time to share everything we learned.
Join us in Paris at STATION F for a hands-on weekend of workshops and a hackathon focused on making open-source LLMs faster and more efficient on AMD.
Prizes, amazing host speakers, ... if you want more details, navigate to https://lu.ma/fmvdjmur!
🚀 SmolAgents v1.19.0 is live! This release brings major improvements to agent flexibility, UI usability, streaming architecture, and developer experience: making it easier than ever to build smart, interactive AI agents. Here's what's new:
🔧 Agent Upgrades - Support for managed agents in ToolCallingAgent - Context manager support for cleaner agent lifecycle handling - Output formatting now uses XML tags for consistency
🖥️ UI Enhancements - GradioUI now supports reset_agent_memory: perfect for fresh starts in dev & demos.
🔄 Streaming Refactor - Streaming event aggregation moved off the Model class - ➡️ Better architecture & maintainability
📦 Output Tracking - CodeAgent outputs are now stored in ActionStep - ✅ More visibility and structure to agent decisions
🐛 Bug Fixes - Smarter planning logic - Cleaner Docker logs - Better prompt formatting for additional_args - Safer internal functions and final answer matching
📚 Docs Improvements - Added quickstart examples with tool usage - One-click Colab launch buttons - Expanded reference docs (AgentMemory, GradioUI docstrings) - Fixed broken links and migrated to .md format
🖼️ VLMs/OCR > moonshotai/Kimi-VL-A3B-Thinking-2506 is a powerful reasoning vision LM, 3B active params, smarter with less tokens, supports long documents, videos 👏 (OS) > nanonets/Nanonets-OCR-s is 3.75B params OCR model based on Qwen2.5VL-3B-Instruct (OS)
🗣️ Audio > Google released google/magenta-realtime, real time music generation & audio synthesis (cc-by-4) > kyutai released new speech-to-text models that come in 1B & 2B (kyutai/stt-1b-en_fr, stt-2b-en_fr) with 0.5s and 2.5s delay
✨ 32B - Apache 2.0 ✨ 38.0% pass@1 on SWE-bench Verified ✨ Up to 47.0% with test-time scaling ✨ Shows clear data scaling law (8K+ demos) ✨ Built on Qwen2.5-Coder-32B + OpenHands