Daniel van Strien's picture

Daniel van Strien PRO

davanstrien

AI & ML interests

Machine Learning Librarian

Recent Activity

updated a dataset about 5 hours ago
data-is-better-together/fineweb-c-progress
updated a dataset about 6 hours ago
librarian-bots/model_cards_with_metadata
updated a dataset about 22 hours ago
librarian-bots/dataset_cards_with_metadata
View all activity

Organizations

Hugging Face's profile picture Notebooks-explorers's profile picture Nasjonalbiblioteket AI Lab's profile picture Living with Machines's profile picture BigScience Workshop's profile picture Spaces-explorers's profile picture BigScience Catalogue Data's profile picture Hacks/Hackers's profile picture flyswot's profile picture BigScience: LMs for Historical Texts's profile picture Webhooks Explorers (BETA)'s profile picture HuggingFaceM4's profile picture Open Access AI Collective's profile picture HF Canonical Model Maintainers's profile picture BigLAM: BigScience Libraries, Archives and Museums's profile picture Hugging Face OSS Metrics's profile picture ImageIN's profile picture Stable Diffusion Bias Eval's profile picture Librarian Bots's profile picture Blog-explorers's profile picture Hacktoberfest 2023's profile picture Hugging Face Smol Models Research's profile picture geospatial's profile picture HPLT's profile picture HF-IA-archiving's profile picture 2A2I Legacy Models & Datasets's profile picture testy's profile picture DIBT-for-Klingon's profile picture Wikimedia Movement's profile picture DIBT-for-Esperanto's profile picture Journalists on Hugging Face's profile picture PleIAs's profile picture Persian AI Community's profile picture HuggingFaceFW's profile picture Data Is Better Together's profile picture Social Post Explorers's profile picture OMOTO AI's profile picture academic-datasets's profile picture HuggingFaceFW-Dev's profile picture Hugging Face Discord Community's profile picture UCSF-JHU Opioid Industry Documents Archive's profile picture Dataset Tools's profile picture PDFPages's profile picture dibt-private's profile picture Data Is Better Together Contributor's profile picture Bluesky Community's profile picture Open R1's profile picture Reasoning datasets competition 's profile picture

Posts 51

view post
Post
1430
I've created a v1 dataset ( davanstrien/reasoning-required) and model ( davanstrien/ModernBERT-based-Reasoning-Required) to help curate "wild text" data for generating reasoning examples beyond the usual code/math/science domains.

- I developed a "Reasoning Required" dataset with a 0-4 scoring system for reasoning complexity
- I used educational content from HuggingFaceFW/fineweb-edu, adding annotations for domains, reasoning types, and example questions

My approach enables a more efficient workflow: filter text with small models first, then use LLMs only on high-value content.

This significantly reduces computation costs while expanding reasoning dataset domain coverage.

Articles 21

Article
304

Open-R1: Update #1