David Berenstein's picture

David Berenstein

davidberenstein1957

AI & ML interests

Everything data

Recent Activity

Articles

Organizations

Hugging Face's profile picture SomosNLP's profile picture Tools's profile picture Webhooks Explorers (BETA)'s profile picture Argilla's profile picture Blog-explorers's profile picture Argilla Explorers's profile picture distilabel-internal-testing's profile picture Data Is Better Together's profile picture Social Post Explorers's profile picture argilla-internal-testing's profile picture Dataset Viber's profile picture Argilla Warehouse's profile picture Dataset Tools's profile picture Uplimit's profile picture Data Is Better Together Contributor's profile picture FeeL (Feedback Loop)'s profile picture Smol Blueprint's profile picture

davidberenstein1957's activity

posted an update about 22 hours ago
reacted to burtenshaw's post with πŸš€ about 23 hours ago
reacted to ariG23498's post with πŸš€ 1 day ago
reacted to Tonic's post with πŸ”₯ 2 days ago
view post
Post
1143
πŸ™‹πŸ»β€β™‚οΈ Hey there folks ,

Facebook AI just released JASCO models that make music stems .

you can try it out here : Tonic/audiocraft

hope you like it
reacted to mlabonne's post with πŸ€—πŸ”₯ 2 days ago
view post
Post
2000
πŸ†• LLM Course 2025 edition!

I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.

The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.

I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.

Thanks everyone, hope you'll enjoy it!

πŸ’» LLM Course: https://huggingface.co/blog/mlabonne/llm-course
reacted to burtenshaw's post with πŸš€πŸ”₯ 3 days ago
view post
Post
22569
We’re launching a FREE and CERTIFIED course on Agents!

We're thrilled to announce the launch of the Hugging Face Agents course on Learn! This interactive, certified course will guide you through building and deploying your own AI agents.

Here's what you'll learn:

- Understanding Agents: We'll break down the fundamentals of AI agents, showing you how they use LLMs to perceive their environment (observations), reason about it (thoughts), and take actions. Think of a smart assistant that can book appointments, answer emails, or even write code based on your instructions.
- Building with Frameworks: You'll dive into popular agent frameworks like LangChain, LlamaIndex and smolagents. These tools provide the building blocks for creating complex agent behaviors.
- Real-World Applications: See how agents are used in practice, from automating SQL queries to generating code and summarizing complex documents.
- Certification: Earn a certification by completing the course modules, implementing a use case, and passing a benchmark assessment. This proves your skills in building and deploying AI agents.
Audience

This course is designed for anyone interested in the future of AI. Whether you're a developer, data scientist, or simply curious about AI, this course will equip you with the knowledge and skills to build your own intelligent agents.

Enroll today and start building the next generation of AI agent applications!

https://bit.ly/hf-learn-agents
Β·
posted an update 4 days ago
replied to davanstrien's post 8 days ago
view reply

Open collaboration is key for democratising AI.

reacted to davanstrien's post with πŸ€β€οΈπŸš€ 8 days ago
view post
Post
2074
The data-is-better-together/fineweb-c dataset is growing!

This week a few more languages have got 1,000 annotations for the educational quality of data from HuggingFaceFW/fineweb-2.

Why should you care?

The quality of pre-training data can have a big impact on the performance of downstream language models trained on that data ( HuggingFaceFW/blogpost-fineweb-v1).

Being able to filter by educational quality is on way of improving the quality of the data you use for training an LLM. Very importantly this approach can also reduce the amount of data needed for pertaining.

Why not use an LLM?

LLMs can be used to annotate educational quality for a subset of data. This data can then be used to train a smaller encoder only model to label the full dataset. However, this may not work well for languages outside of english. This is where fineweb-c (community) comes in.

The community is annotating the educational quality of fineweb2 data. Currently 114 languages have some annotations. These annotations will enable a number of things:

- Evaluate whether an LLM can label the educational quality for texts in that language well
- Directly be used for training quality classifiers
- Help discover other rules and huerisitcs for refining fineweb2 further for different languages.

This week the following languages where done:

Swedish thanks to: @Lauler @AntonVic @ohallstrom @bjarlestam @menbom @Ekgren @apsod

Ukrainian thanks to: @hannayukhymenko @robinhad @realPivo @RabotiahovDmytro @reciprocate

Assamese thanks to: @moyoor97 @Arpanjyoti @nawaf-helmi123 @pahigogoi1 @aelhence @kishorekashyap

Want to learn more: https://huggingface.co/blog/davanstrien/fineweb2-community

Contribute yourself here: data-is-better-together/fineweb-c
  • 1 reply
Β·
posted an update 14 days ago
posted an update 19 days ago
posted an update about 1 month ago
reacted to their post with πŸ”₯ about 1 month ago
view post
Post
4198
Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code.

Blog: https://huggingface.co/blog/synthetic-data-generator
Space: argilla/synthetic-data-generator
  • 4 replies
Β·
replied to their post about 1 month ago
replied to their post about 1 month ago
view reply

thanks! Hope you can create some cool and useful datasets with it!

reacted to jwlben11's post with πŸ€— about 1 month ago
view post
Post
2144
What is the use of hugginface? How can I get up to speed on ML and AI and how to use this platform? Would be nice if there was a get started here section.
  • 1 reply
Β·