๐ Supercharge your LLM apps with Langfuse on Hugging Face Spaces!
Langfuse brings end-to-end observability and tooling to accelerate your dev workflow from experiments through production
Now available as a Docker Space directly on the HF Hub! ๐ค
๐ Trace everything: monitor LLM calls, retrieval, and agent actions with popular frameworks 1โฃ One-click deployment: on Spaces with persistent storage and integrated OAuth ๐ Simple Prompt Management: Version, edit, and update without redeployment โ Intuitive Evals: Collect user feedback, run model/prompt evaluations, and improve quality ๐ Dataset Creation: Build datasets directly from production data to enhance future performance
Kudos to the Langfuse team for this collab and the awesome, open-first product theyโre building! ๐ @marcklingen@Clemo@MJannik
Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !
3x more tokens.
By reducing our memory footprint, weโre able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments. 13x faster
On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Daniรซl de Kok for the beast data structure. Zero config
Thatโs it. Remove all the flags your are using and youโre likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we donโt have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.
- Pre-training code with nanotron - Evaluation suite with lighteval - Synthetic data generation using distilabel (powers our new SFT dataset HuggingFaceTB/smoltalk) - Post-training scripts with TRL & the alignment handbook - On-device tools with llama.cpp for summarization, rewriting & agents
Apache 2.0 licensed. V2 pre-training data mix coming soon!
Trace LLM calls with Arize AI's Phoenix observability dashboards on Hugging Face Spaces! ๐
โจ I just added a new recipe to the Open-Source AI Cookbook that shows you how to: 1๏ธโฃ Deploy Phoenix on HF Spaces with persistent storage in a few clicks 2๏ธโฃ Configure LLM tracing with the ๐ฆ๐ฒ๐ฟ๐๐ฒ๐ฟ๐น๐ฒ๐๐ ๐๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ ๐๐ฃ๐ 3๏ธโฃ Observe multi-agent application runs with the CrewAI integration
๐ข๐ฏ๐๐ฒ๐ฟ๐๐ฎ๐ฏ๐ถ๐น๐ถ๐๐ ๐ถ๐ ๐ฐ๐ฟ๐๐ฐ๐ถ๐ฎ๐น for building robust LLM apps.
Phoenix makes it easy to visualize trace data, evaluate performance, and track down issues. Give it a try!
๐ท FineWeb technical report is out and so is ๐ FineWeb-Edu, a 1.3 trillion tokens dataset that outperforms all other open web datasets, with remarkable improvements on educational benchmarksย such as MMLU, ARC, and OpenBookQA.
We used Llama 3 generations to train an educational quality classifier, filtering the 15 trillion tokens of FineWeb to select only those with high educational value (an approach also used in Llama 3 and Phi-3 training datasets). We're releasing both FineWeb-Edu and the classifier, along with a larger, less heavily filtered version containing 5.4 trillion tokens.
You can find more details about the dataset and the experiments we ran in the FineWeb technical report, It's a 45-minute read but it contains all the secret sauce for building high quality web datasets.
Inspired by the awesome work from @mlabonne, I created a Space to monitor the narrowing gap between open and proprietary LLMs as scored by the LMSYS Chatbot Arena ELO ratings ๐ค
The goal is to have a continuously updated place to easily visualize these rapidly evolving industry trends ๐