Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress Paper • 2408.14960 • Published Aug 27, 2024
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning Paper • 2410.10801 • Published Oct 14, 2024 • 2
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge Paper • 2411.19799 • Published Nov 29, 2024 • 14
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation Paper • 2412.03304 • Published Dec 4, 2024 • 19
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models Paper • 2406.03368 • Published Jun 5, 2024
Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published Dec 19, 2024 • 9
M-RewardBench: Evaluating Reward Models in Multilingual Settings Paper • 2410.15522 • Published Oct 20, 2024 • 12
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm Paper • 2406.18682 • Published Jun 26, 2024
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives Paper • 2407.01490 • Published Jul 1, 2024 • 1
On the Limitations of Compute Thresholds as a Governance Strategy Paper • 2407.05694 • Published Jul 8, 2024 • 2
Consent in Crisis: The Rapid Decline of the AI Data Commons Paper • 2407.14933 • Published Jul 20, 2024 • 12
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20, 2024 • 43
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Paper • 2402.14740 • Published Feb 22, 2024 • 13
From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models Paper • 2403.03893 • Published Mar 6, 2024
Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published May 23, 2024 • 32
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model Paper • 2402.07827 • Published Feb 12, 2024 • 49
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning Paper • 2402.06619 • Published Feb 9, 2024 • 57