Your Bench

Enterprise

community

huggingface/yourbench

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

thomwolf authored a paper about 2 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

lvwerra authored a paper about 2 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

sumuks updated a dataset about 2 months ago

yourbench/llm-pdf-ingestion-demo

View all activity

thomwolf

authored a paper about 2 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 68

lvwerra

authored a paper about 2 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 68

dilekht

authored 7 papers 2 months ago

YourBench: Easy Custom Evaluation Sets for Everyone

Paper • 2504.01833 • Published Apr 2 • 22

From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents

Paper • 2410.23555 • Published Oct 31, 2024

Better Slow than Sorry: Introducing Positive Friction for Reliable Dialogue Systems

Paper • 2501.17348 • Published Jan 28

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16 • 46

TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons

Paper • 2504.19982 • Published Apr 28

Language Specific Knowledge: Do Models Know Better in X than in English?

Paper • 2505.14990 • Published May 21 • 1

PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents

Paper • 2505.01592 • Published May 2

thomwolf

authored a paper 3 months ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 128

sumuks

authored a paper 3 months ago

PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents

Paper • 2505.01592 • Published May 2

clefourrier

posted an update 3 months ago

Post

1253

Always surprised that so few people actually read the FineTasks blog, on
✨how to select training evals with the highest signal✨

If you're serious about training models without wasting compute on shitty runs, you absolutely should read it!!

An high signal eval actually tells you precisely, during training, how wel & what your model is learning, allowing you to discard the bad runs/bad samplings/...!

The blog covers in depth prompt choice, metrics, dataset, across languages/capabilities, and my fave section is "which properties should evals have"👌
(to know on your use case how to select the best evals for you)

Blog: HuggingFaceFW/blogpost-fine-tasks

2 replies

thomwolf

posted an update 4 months ago

Post

6369

If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Face—in robotics and across all AI fields—we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at