BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction Paper • 2503.19658 • Published 13 days ago • 2
AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization Paper • 2503.22526 • Published 9 days ago • 2
TextBite: A Historical Czech Document Dataset for Logical Page Segmentation Paper • 2503.16664 • Published 17 days ago • 2
Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages Paper • 2502.10140 • Published Feb 14 • 9
Falcon3 Collection Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. • 40 items • Updated Feb 13 • 84
EU20-Benchmarks Collection Evaluation Benchmarks for 20 European languages. • 5 items • Updated Oct 11, 2024 • 8
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Feb 26 • 581
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12, 2024 • 137
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20, 2024 • 78