classla/xlm-roberta-base-multilingual-text-genre-classifier Text Classification • Updated 15 days ago • 8.36k • 28
classla/multilingual-IPTC-news-topic-classifier Text Classification • Updated Dec 6, 2024 • 52.1k • 9
Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages Paper • 2403.08693 • Published Mar 13, 2024 • 2
Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages Paper • 2403.08693 • Published Mar 13, 2024 • 2
LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification Paper • 2411.19638 • Published Nov 29, 2024 • 6 • 2
Running 602 602 FineWeb: decanting the web for the finest text data at scale 🍷 Generate high-quality web text data for LLM training
Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining Paper • 2404.05428 • Published Apr 8, 2024