π§ Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community β’ 24 items β’ Updated 16 days ago β’ 146
view article Article Improving Parquet Dedupe on Hugging Face Hub By yuchenglow and 1 other β’ Oct 5, 2024 β’ 33
view article Article Introducing BERTopic Integration with Hugging Face Hub By davanstrien and 1 other β’ May 31, 2023 β’ 9
view article Article Data exploration and filtering with Nomic Atlas By visheratin β’ Mar 22, 2024 β’ 5
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community By Leyo and 2 others β’ Apr 15, 2024 β’ 180
view article Article Fine-Tuning Gemma Models in Hugging Face By svaibhav and 3 others β’ Feb 23, 2024 β’ 33
view article Article The 5 Most Under-Rated Tools on Hugging Face By derek-thomas β’ Aug 22, 2024 β’ 89
view article Article SmolLM - blazingly fast and remarkably powerful By loubnabnl and 2 others β’ Jul 16, 2024 β’ 374
view article Article Docmatix - a huge dataset for Document Visual Question Answering By andito and 1 other β’ Jul 18, 2024 β’ 73
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models By loubnabnl and 2 others β’ Mar 20, 2024 β’ 94
view article Article Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality By evijit and 9 others β’ Jun 24, 2024 β’ 34
view article Article Experimenting with Automatic PII Detection on the Hub using Presidio By lhoestq and 3 others β’ Jul 10, 2024 β’ 24
view article Article Announcing New Dataset Search Features By lhoestq and 2 others β’ Jul 8, 2024 β’ 23
view article Article How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o By chilijung β’ May 31, 2024 β’ 11
view article Article Synthetic dataset generation techniques: generating custom sentence similarity data By davanstrien β’ May 23, 2024 β’ 16