Honored to be named among their 12 pioneers and power players in the news industry in the 2025 Tech Trends Report from Future Today Strategy Group.
Incredible group to be part of - each person is doing groundbreaking work at the intersection of AI and journalism. Worth following them all: they're consistently sharing practical insights on building the future of news.
Take the time to read this report, it's packed with insights as always. The news & information section's #1 insight hits hard: "The most substantive economic impact of AI to date has been licensing payouts for a handful of big publishers. The competition will start shifting in the year ahead to separate AI 'haves' that have positioned themselves to grow from the 'have-nots.'"
This AI-driven divide is something I've been really concerned about. Now is the time to build more than ever!
Here's a quick walk through of the first drop of material that works toward the use case:
- a fundamental introduction to reinforcement learning. Answering questions like, ‘what is a reward?’ and ‘how do we create an environment for a language model?’
- Then it focuses on Deepseek R1 by walking through the paper and highlighting key aspects. This is an old school way to learn ML topics, but it always works.
- Next, it takes to you Transformers Reinforcement Learning and demonstrates potential reward functions you could use. This is cool because it uses Marimo notebooks to visualise the reward.
- Finally, Maxime walks us through a real training notebook that uses GRPO to reduce generation length. I’m really into this because it works and Maxime took the time to validate it share assets and logging from his own runs for you to compare with.
Maxime’s work and notebooks have been a major part of the open source community over the last few years. I, like everyone, have learnt so much from them.
Extremely bullish on @CohereForAI's Aya Vision (8B & 32B) - new SOTA open-weight VLMs
- 8B wins up to 81% of the time in its class, better than Gemini Flash - 32B beats Llama 3.2 90B! - Covers 23 languages, excels in image captioning, VQA & more - Integrated on transformers from Day 0!
What if AI becomes as ubiquitous as the internet, but runs locally and transparently on our devices?
Fascinating TED talk by @thomwolf on open source AI and its future impact.
Imagine this for AI: instead of black box models running in distant data centers, we get transparent AI that runs locally on our phones and laptops, often without needing internet access. If the original team moves on? No problem - resilience is one of the beauties of open source. Anyone (companies, collectives, or individuals) can adapt and fix these models.
This is a compelling vision of AI's future that solves many of today's concerns around AI transparency and centralized control.
I'm excited to share the first episode of our AI-generated podcast series focusing on nice datasets from the Hugging Face Hub!
This first episode explores mathematical reasoning datasets:
- SynthLabsAI/Big-Math-RL-Verified: Over 250,000 rigorously verified problems spanning multiple difficulty levels and mathematical domains - open-r1/OpenR1-Math-220k: 220,000 math problems with multiple reasoning traces, verified for accuracy using Math Verify and Llama-3.3-70B models. - facebook/natural_reasoning: 1.1 million general reasoning questions carefully deduplicated and decontaminated from existing benchmarks, showing superior scaling effects when training models like Llama3.1-8B-Instruct.
Is this the best tool to extract clean info from PDFs, handwriting and complex documents yet?
Open source olmOCR just dropped and the results are impressive.
Tested the free demo with various documents, including a handwritten Claes Oldenburg letter. The speed is impressive: 3000 tokens/second on your own GPU - that's 1/32 the cost of GPT-4o ($190/million pages). Game-changer for content extraction and digital archives.
To achieve this, Ai2 trained a 7B vision language model on 260K pages from 100K PDFs using "document anchoring" - combining PDF metadata with page images.
Best part: it actually understands document structure (columns, tables, equations) instead of just jumbling everything together like most OCR tools. Their human eval results back this up.
This week we are releasing the first framework unit in the course and it’s on smolagents. This is what the unit covers:
- why should you use smolagents vs another library? - how to build agents that use code - build multiagents systems - use vision language models for browser use
The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric.
🔥 Agents can do anything! @microsoft Research just announced the release of Magma 8B!
Magma is a new Visual Language Model (VLM) with 8B parameters for multi-modal agents designed to handle complex interactions across virtual and real environments; and it's MIT licensed!
Magma comes with exciting new features such as: - Introduces the Set-of-Mark and Trace-of-Mark techniques for fine-tuning - Leverages a large amount of unlabeled video data to learn the spatial-temporal grounding and planning - A strong generalization and ability to be fine-tuned for other agentic tasks - SOTA in different multi-modal benchmarks spanning across UI navigation, robotics manipulation, image / video understanding and spatial understanding and reasoning - Generates goal-driven visual plans and actions for agentic use cases
🚀 Just launched: A toolkit of 20 powerful AI tools that journalists can use right now - transcribe, analyze, create. 100% free & open-source.
Been testing all these tools myself and created a searchable collection of the most practical ones - from audio transcription to image generation to document analysis. No coding needed, no expensive subscriptions.
Some highlights I've tested personally: - Private, on-device transcription with speaker ID in 100+ languages using Whisper - Website scraping that just works - paste a URL, get structured data - Local image editing with tools like Finegrain (impressive results) - Document chat using Qwen 2.5 72B (handles technical papers well)
Sharing this early because the best tools come from the community. Drop your favorite tools in the comments or join the discussion on what to add next!
We now have a Deep Research for academia: SurveyX automatically writes academic surveys nearly indistinguishable from human-written ones 🔥
Researchers from Beijing and Shanghai just published the first application of a deep research system to academia: their algorithm, given a question, can give you a survey of all papers on the subject.
To make a research survey, you generally follow two steps, preparation (collect and organize papers) and writing (outline creation, writing, polishing). Researchers followed the same two steps and automated them.
🎯 For the preparation part, a key part is find all the important references on the given subject. Researchers first cast a wide net of all relevant papers. But then finding the really important ones is like distilling knowledge from a haystack of information. To solve this challenge, they built an “AttributeTree” object that structures key information from citations. Ablating these AttributeTrees significantly decreased structure and synthesis scores, so they were really useful!
📝 For the writing part, key was to get a synthesis that's both short and true. This is not easy to get with LLMs! So they used methods like LLM-based deduplication to shorten the too verbose listings made by LLMs, and RAG to grab original quotes instead of made-up ones.
As a result, their system outperforms previous approaches by far!
As assessed by LLM-judges, the quality score os SurveyX even approaches this of human experts, with 4.59/5 vs 4.75/5 🏆
Trying something new to keep you ahead of the curve: The 5 AI stories of the week - a weekly curation of the most important AI news you need to know. Do you like it?
Hacked together a way to log trl GRPO training completions to a 🤗 dataset repo. This allows you to:
- Track rewards from multiple reward functions - Treat the completion and rewards from training as a "proper" dataset and do EDA - Share results for open science
The implementation is super hacky, but I'm curious if people would find this useful.
To push completions to the Hub, you just need two extra parameters:
AGENTS + FINETUNING! This week Hugging Face learn has a whole pathway on finetuning for agentic applications. You can follow these two courses to get knowledge on levelling up your agent game beyond prompts: