Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models Paper • 2406.09206 • Published Jun 13, 2024 • 1
OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 27 items • Updated Nov 6, 2024 • 122
EU20-Benchmarks Collection Evaluation Benchmarks for 20 European languages. • 5 items • Updated Oct 11, 2024 • 7
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time Paper • 2408.13233 • Published Aug 23, 2024 • 22
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18, 2024 • 54
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs Paper • 2407.03963 • Published Jul 4, 2024 • 16
AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets Paper • 2404.05623 • Published Apr 8, 2024 • 3
🎧AI Podcasts and Talks! Collection 🤗Cool stuff to listen to at any time! • 10 items • Updated Oct 6, 2023 • 5
Small-Text: Active Learning for Text Classification in Python Paper • 2107.10314 • Published Jul 21, 2021 • 1