Code Llama

company
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

codellama's activity

andrewrreedย 
posted an update 3 months ago
view post
Post
2852
๐Ÿš€ Supercharge your LLM apps with Langfuse on Hugging Face Spaces!

Langfuse brings end-to-end observability and tooling to accelerate your dev workflow from experiments through production

Now available as a Docker Space directly on the HF Hub! ๐Ÿค—

๐Ÿ” Trace everything: monitor LLM calls, retrieval, and agent actions with popular frameworks
1โƒฃ One-click deployment: on Spaces with persistent storage and integrated OAuth
๐Ÿ›  Simple Prompt Management: Version, edit, and update without redeployment
โœ… Intuitive Evals: Collect user feedback, run model/prompt evaluations, and improve quality
๐Ÿ“Š Dataset Creation: Build datasets directly from production data to enhance future performance

Kudos to the Langfuse team for this collab and the awesome, open-first product theyโ€™re building! ๐Ÿ‘ @marcklingen @Clemo @MJannik

๐Ÿ”— Space: langfuse/langfuse-template-space
๐Ÿ”— Docs: https://huggingface.co/docs/hub/spaces-sdks-docker-langfuse
  • 1 reply
ยท
Narsilย 
posted an update 4 months ago
view post
Post
1537
Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !



3x more tokens.

By reducing our memory footprint, weโ€™re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments.
13x faster

On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Dani รซl de Kok for the beast data structure.
Zero config

Thatโ€™s it. Remove all the flags your are using and youโ€™re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we donโ€™t have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.

Read more: https://huggingface.co/docs/text-generation-inference/conceptual/chunking
loubnabnlย 
posted an update 5 months ago
view post
Post
3424
Making SmolLM2 reproducible: open-sourcing our training & evaluation toolkit ๐Ÿ› ๏ธ https://github.com/huggingface/smollm/

- Pre-training code with nanotron
- Evaluation suite with lighteval
- Synthetic data generation using distilabel (powers our new SFT dataset HuggingFaceTB/smoltalk)
- Post-training scripts with TRL & the alignment handbook
- On-device tools with llama.cpp for summarization, rewriting & agents

Apache 2.0 licensed. V2 pre-training data mix coming soon!

Which other tools should we add next?
andrewrreedย 
posted an update 5 months ago
view post
Post
1064
Trace LLM calls with Arize AI's Phoenix observability dashboards on Hugging Face Spaces! ๐Ÿš€

โœจ I just added a new recipe to the Open-Source AI Cookbook that shows you how to:
1๏ธโƒฃ Deploy Phoenix on HF Spaces with persistent storage in a few clicks
2๏ธโƒฃ Configure LLM tracing with the ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ๐—น๐—ฒ๐˜€๐˜€ ๐—œ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—”๐—ฃ๐—œ
3๏ธโƒฃ Observe multi-agent application runs with the CrewAI integration

๐—ข๐—ฏ๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† ๐—ถ๐˜€ ๐—ฐ๐—ฟ๐˜‚๐—ฐ๐—ถ๐—ฎ๐—น for building robust LLM apps.

Phoenix makes it easy to visualize trace data, evaluate performance, and track down issues. Give it a try!

๐Ÿ”— Cookbook recipe: https://huggingface.co/learn/cookbook/en/phoenix_observability_on_hf_spaces
๐Ÿ”— Phoenix docs: https://docs.arize.com/phoenix
ArthurZย 
posted an update 5 months ago
view post
Post
3878
Native tensor parallel has landed in transformers!!! https://github.com/huggingface/transformers/pull/34184 thanks a lot to the torch team for their support!

Contributions are welcome to support more models! ๐Ÿ”ฅ
loubnabnlย 
posted an update 11 months ago
view post
Post
5885
๐Ÿท FineWeb technical report is out and so is ๐Ÿ“š FineWeb-Edu, a 1.3 trillion tokens dataset that outperforms all other open web datasets, with remarkable improvements on educational benchmarksย such as MMLU, ARC, and OpenBookQA.

Technical report: HuggingFaceFW/blogpost-fineweb-v1
Dataset: HuggingFaceFW/fineweb-edu

We used Llama 3 generations to train an educational quality classifier, filtering the 15 trillion tokens of FineWeb to select only those with high educational value (an approach also used in Llama 3 and Phi-3 training datasets). We're releasing both FineWeb-Edu and the classifier, along with a larger, less heavily filtered version containing 5.4 trillion tokens.

You can find more details about the dataset and the experiments we ran in the FineWeb technical report, It's a 45-minute read but it contains all the secret sauce for building high quality web datasets.

Enjoy!
Narsilย 
posted an update 11 months ago
andrewrreedย 
posted an update 11 months ago
view post
Post
2613
๐Ÿ”ฌ Open LLM Progress Tracker ๐Ÿ”ฌ

Inspired by the awesome work from @mlabonne , I created a Space to monitor the narrowing gap between open and proprietary LLMs as scored by the LMSYS Chatbot Arena ELO ratings ๐Ÿค—

The goal is to have a continuously updated place to easily visualize these rapidly evolving industry trends ๐Ÿš€

๐Ÿ”— Open LLM Progress Tracker: andrewrreed/closed-vs-open-arena-elo
๐Ÿ”— Source of Inspiration: https://www.linkedin.com/posts/maxime-labonne_arena-elo-graph-updated-with-new-models-activity-7187062633735368705-u2jB/
  • 2 replies
ยท
Narsilย 
posted an update 12 months ago