@lamhieu on Hugging Face: "🚀 Introducing the xLLMs Dataset Collection The xLLMs project is a growing…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

lamhieu

posted an update Oct 19, 2025

Post

2765

🚀 Introducing the xLLMs Dataset Collection

The xLLMs project is a growing suite of multilingual and multimodal dialogue datasets designed to train and evaluate advanced conversational LLMs. Each dataset focuses on a specific capability — from long-context reasoning and factual grounding to STEM explanations, math Q&A, and polite multilingual interaction.

🌍 Explore the full collection on Hugging Face:
👉 lamhieu/xllms-66cdfe34307bb2edc8c6df7d

💬 Highlight: xLLMs – Dialogue Pubs
A large-scale multilingual dataset built from document-guided synthetic dialogues (Wikipedia, WikiHow, and technical sources). It’s ideal for training models on long-context reasoning, multi-turn coherence, and tool-augmented dialogue across 9 languages.
👉 lamhieu/xllms_dialogue_pubs

🧠 Designed for:
- Long-context and reasoning models
- Multilingual assistants
- Tool-calling and structured response learning

All datasets are open for research and development use — free, transparent, and carefully curated to improve dialogue model quality.

cqhofsns

Oct 19, 2025

Thank you for this contribution !!!

lamhieu

Oct 19, 2025

go ahead and create something interesting ;)

Keeby-smilyai

Oct 20, 2025

Wow great datasets!

lamhieu

Oct 21, 2025

using to build tf great ;)

In this post