Llama-4 is out and I couldn't resist but to cook something with it... So I came up with ๐๐ฅ๐๐ฆ๐๐๐๐ฌ๐๐๐ซ๐๐ก๐๐ซ (https://llamaresearcher.com), your deep-research AI companion!๐
The workflow behind ๐๐น๐ฎ๐บ๐ฎ๐ฅ๐ฒ๐๐ฒ๐ฎ๐ฟ๐ฐ๐ต๐ฒ๐ฟ is simple: ๐ฌ You submit a query ๐ก๏ธ Your query is evaluated by Llama 3 guard model, which deems it safe or unsafe ๐ง If your query is safe, it is routed to the Researcher Agent โ๏ธ The Researcher Agent expands the query into three sub-queries, with which to search the web ๐ The web is searched for each of the sub-queries ๐ The retrieved information is evaluated for relevancy against your original query โ๏ธ The Researcher Agent produces an essay based on the information it gathered, paying attention to referencing its sources
The agent itself is also built with easy-to-use and intuitive blocks: ๐ฆ LlamaIndex provides the agentic architecture and the integrations with the language models โกGroq makes Llama-4 available with its lightning-fast inference ๐ Linkup allows the agent to deep-search the web and provides sourced answers ๐ช FastAPI does the heavy loading with wrapping everything within an elegant API interface โฑ๏ธ Redis is used for API rate limiting ๐จ Gradio creates a simple but powerful user interface
Special mention also to Lovable, which helped me build the first draft of the landing page for LlamaResearcher!๐
If you're curious and you want to try LlamaResearcher, you can - completely for free and without subscription - for 30 days from now โก๏ธ https://llamaresearcher.com And if you're like me, and you like getting your hands in code and build stuff on your own machine, I have good news: this is all open-source, fully reproducible locally and Docker-ready๐ Just go to the GitHub repo: https://github.com/AstraBert/llama-4-researcher and don't forget to star it, if you find it useful!โญ
As always, have fun and feel free to leave your feedbackโจ
Huge week for xet-team as Llama 4 is the first major model on Hugging Face uploaded with Xet providing the backing! Every byte downloaded comes through our infrastructure.
Using Xet on Hugging Face is the fastest way to download and iterate on open source models and we've proved it with Llama 4 giving a boost of ~25% across all models.
We expect builders on the Hub to see even more improvements, helping power innovation across the community.
With the models on our infrastructure, we can peer in and see how well our dedupe performs across the Llama 4 family. On average, we're seeing ~25% dedupe, providing huge savings to the community who iterate on these state-of-the-art models. The attached image shows a few selected models and how they perform on Xet.
Thanks to the meta-llama team for launching on Xet!
What, How, Where, and How Well? This paper reviews test-time scaling methods and all you need to know about them: > parallel, sequential, hybrid, internal scaling > how to scale (SFT, RL, search, verification) > metrics and evals of test-time scaling