I've published an article showing five ways to use ๐ชข Langfuse with ๐ค Hugging Face.
My personal favorite is Method #4: Using Hugging Face Datasets for Langfuse Dataset Experiments. This lets you benchmark your LLM app or AI agent with a dataset hosted on Hugging Face. In this example, I chose the GSM8K dataset (openai/gsm8k) to test the mathematical reasoning capabilities of my smolagent :)
๐ Supercharge your LLM apps with Langfuse on Hugging Face Spaces!
Langfuse brings end-to-end observability and tooling to accelerate your dev workflow from experiments through production
Now available as a Docker Space directly on the HF Hub! ๐ค
๐ Trace everything: monitor LLM calls, retrieval, and agent actions with popular frameworks 1โฃ One-click deployment: on Spaces with persistent storage and integrated OAuth ๐ Simple Prompt Management: Version, edit, and update without redeployment โ Intuitive Evals: Collect user feedback, run model/prompt evaluations, and improve quality ๐ Dataset Creation: Build datasets directly from production data to enhance future performance
Kudos to the Langfuse team for this collab and the awesome, open-first product theyโre building! ๐ @marcklingen@Clemo@MJannik
After some heated discussion ๐ฅ, we clarify our intent re. storage limits on the Hub
TL;DR: - public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible - private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐ฅ
Trace LLM calls with Arize AI's Phoenix observability dashboards on Hugging Face Spaces! ๐
โจ I just added a new recipe to the Open-Source AI Cookbook that shows you how to: 1๏ธโฃ Deploy Phoenix on HF Spaces with persistent storage in a few clicks 2๏ธโฃ Configure LLM tracing with the ๐ฆ๐ฒ๐ฟ๐๐ฒ๐ฟ๐น๐ฒ๐๐ ๐๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ ๐๐ฃ๐ 3๏ธโฃ Observe multi-agent application runs with the CrewAI integration
๐ข๐ฏ๐๐ฒ๐ฟ๐๐ฎ๐ฏ๐ถ๐น๐ถ๐๐ ๐ถ๐ ๐ฐ๐ฟ๐๐ฐ๐ถ๐ฎ๐น for building robust LLM apps.
Phoenix makes it easy to visualize trace data, evaluate performance, and track down issues. Give it a try!
Made a new app to visualize the LLM race โ ๐ก๐ผ ๐๐๐ฟ๐ผ๐ฝ๐ฒ๐ฎ๐ป ๐ฐ๐ผ๐บ๐ฝ๐ฎ๐ป๐ ๐ถ๐ป ๐๐ต๐ฒ ๐๐ผ๐ฝ ๐ญ๐ฌ ๐ช๐บโ
The outcome is quite sad, as a Frenchman and European.
The top 10 is exclusively US ๐บ๐ธ and Chinese ๐จ๐ณ companies (after great Chinese LLM releases recently, like the Qwen2.5 series), with the notable exception of Mistral AI ๐ซ๐ท.
American companies are making fast progress, Chinese ones even faster. Europe is at risk of being left behind. And the EU AI Act hasn't even come into force yet to slow down the EU market. We need to wake up ๐ฌ
โ ๏ธ Caution: This Chatbot Arena ELO ranking is not the most accurate, especially at high scores like this, because LLM makers can game it to some extent.
1 reply
ยท
reacted to jsulz's
post with โค๏ธ๐ฅ4 months ago
When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. Thatโs where our chunk-based approach comes in.
Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means:
โฉ Only upload the chunks that changed. ๐ Download just the updates, not the whole file. ๐ง We store your file as deduplicated chunks
In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isnโt just a performance boost. Itโs a rethinking of how we manage models and datasets on the Hub.
We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows?
See below: I got 105k impressions since regularly posting Hub Posts, coming close to my 275k on Twitter!
โ๏ธ Computed with the great dataset maxiw/hf-posts โ๏ธ Thanks to Qwen2.5-Coder-32B for showing me how to access dict attributes in a SQL request!
This is no Woodstock AI but will be fun nonetheless haha. Iโll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.
1,000 spots available first-come first serve with some surprises during the stream!
TL;DR Make your model write "margin notes" as you chunk prefill the KV cache. Then ask it reread all notes before it speaks up. Works with humans, works with AI ๐ค
WiM leverages the chunked prefill of the key-value cache, which concurrently generates query-based extractive summaries at each step of the prefill that are subsequently reintegrated at the end of the computation. We term these intermediate outputs โmarginsโ, drawing inspiration from the practice of making margin notes for improved comprehension of long contexts in human reading. We show that this technique, which adds only minimal additional computation, significantly improves LLMs long context reasoning capabilities.
Think: Every chunk has a chance to be attended to/ be at the end of the context at least once. ๐
๐ Results: - An average accuracy boost of 7.5% in multi-hop reasoning tasks like HotpotQA and MultiHop-RAG. - Even a 30% increase in F1-score for summarisation-like tasks (CWE).
Plus, WiM fits seamlessly into interactive applications (think: progress bar!). It can provide real-time progress updates during data retrieval and integration, making it user-friendly and transparent - a stark contrast to feeding 1mln tokens to an LLMs and waiting 6 min for the first token. ๐คฏ
Given the impressive benchmarks published my Meta for their Llama-3.1 models, I was curious to see how these models would compare to top proprietary models on Chatbot Arena.
Now we've got the results! LMSys released the ELO derived from thousands of user votes for the new models, and here are the rankings:
๐ฅ 405B Model ranks 5th overall, in front of GPT-4-turbo! But behind GPT-4o, Claude-3.5 Sonnet and Gemini-advanced. ๐ 70B Model climbs up to 9th rank ! From 1206 โก๏ธ 1244. ๐ 8B Model improves from 1152 โก๏ธ 1170.
โ This confirms that Llama-3.1 is a good contender for any task: any of its 3 model size is much cheaper to run than equivalent proprietary models!
For instance, here are the inference prices for the top models; โค GPT-4-Turbo inference price from OpenAI: $5/M input tokens, $15/M output tokens โค Llama-3.1-405B from HF API (for testing only): 3$/M for input or output tokens (Source linked in the first comment) โค Llama-3.1-405B from HF API (for testing only): free โจ
Today is a huge day in Argillaโs history. We couldnโt be more excited to share this with the community: weโre joining Hugging Face!
Weโre embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.
Over the past year, weโve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyrโs learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets
After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, weโre now the same team.
To those of you whoโve been following us, this wonโt be a huge surprise, but it will be a big deal in the coming months. This acquisition means weโll double down on empowering the community to build and collaborate on high quality datasets, weโll bring full support for multimodal datasets, and weโll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.
As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amรฉlie.
Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.
Would love to answer any questions you have so feel free to add them below!
28 replies
ยท
reacted to lunarflu's
post with โค๏ธ10 months ago