WildChat-50m Collection All model responses associated with the WildChat-50m paper. β’ 55 items β’ Updated Jan 29 β’ 8
Whisper Release Collection Whisper includes both English-only and multilingual checkpoints for ASR and ST, ranging from 38M params for the tiny models to 1.5B params for large. β’ 12 items β’ Updated Sep 13, 2023 β’ 99
SWE-bench Collection SWE-bench is a benchmark for evaluating Language Models and AI Systems on their ability resolve real world GitHub Issues. β’ 4 items β’ Updated 20 days ago β’ 3
π§ Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community β’ 18 items β’ Updated 1 day ago β’ 115
MAmmoTH2 Collection Scaling up instruction data from the web for to build better LLMs β’ 13 items β’ Updated Dec 9, 2024 β’ 11
π MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" β’ 13 items β’ Updated Jul 24, 2024 β’ 58
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models β’ 11 items β’ Updated Dec 6, 2024 β’ 654
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action Paper β’ 2312.17172 β’ Published Dec 28, 2023 β’ 28