EMMA-500 Collection Enhancing massively multilingual adaptation of LLMs on 500+ languages https://mala-lm.github.io β’ 9 items β’ Updated 4 days ago β’ 4
OpenCulture Collection A multilingual dataset of public domain books and newspapers. β’ 27 items β’ Updated Nov 6, 2024 β’ 130
Bad Data Toolbox Collection PleIAs collection of models for the data processing of challenging document and data sources. β’ 5 items β’ Updated Jul 18, 2024 β’ 16
Common Corpus Collection Largest multilingual pretraining data. β’ 1 item β’ Updated Nov 13, 2024 β’ 12
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published Feb 20 β’ 144
On Domain-Specific Post-Training for Multimodal Large Language Models Paper β’ 2411.19930 β’ Published Nov 29, 2024 β’ 29
view article Article Welcome to Inference Providers on the Hub π₯ By julien-c and 6 others β’ Jan 28 β’ 483
SmolVLM 256M & 500M Collection Collection for models & demos for even smoller SmolVLM release β’ 12 items β’ Updated May 5 β’ 77
How to Synthesize Text Data without Model Collapse? Paper β’ 2412.14689 β’ Published Dec 19, 2024 β’ 53
view article Article Use Models from the Hugging Face Hub in LM Studio By yagilb β’ Nov 28, 2024 β’ 138
Multilingual LLM Evaluation Collection Multilingual Evaluation Benchmarks β’ 8 items β’ Updated Mar 3 β’ 25
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi β’ 15 items β’ Updated Apr 18 β’ 231
LLMs for Extremely Low-Resource Finno-Ugric Languages Paper β’ 2410.18902 β’ Published Oct 24, 2024 β’ 3
MaLA-LM Collection MaLA-LM: Massive Language Adaptation of Large Language Models β’ 9 items β’ Updated 20 days ago β’ 1
4M Models Collection Multimodal models from https://4m.epfl.ch/ β’ 17 items β’ Updated Mar 7 β’ 31
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. β’ 19 items β’ Updated Nov 22, 2024 β’ 77