Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AdinaYΒ 
posted an update May 18
Post
1849
Data quality is the new frontier for LLM performance.

Ultra-FineWeb πŸ“Š a high-quality bilingual dataset released by OpenBMB

openbmb/Ultra-FineWeb

✨ MIT License
✨ 1T English + 120B Chinese tokens
✨ Efficient model-driven filtering

wheres the data in the dataset... lol

Β·

coming soon