Post
1677
π Introducing the xLLMs Dataset Collection
The xLLMs project is a growing suite of multilingual and multimodal dialogue datasets designed to train and evaluate advanced conversational LLMs. Each dataset focuses on a specific capability β from long-context reasoning and factual grounding to STEM explanations, math Q&A, and polite multilingual interaction.
π Explore the full collection on Hugging Face:
π lamhieu/xllms-66cdfe34307bb2edc8c6df7d
π¬ Highlight: xLLMs β Dialogue Pubs
A large-scale multilingual dataset built from document-guided synthetic dialogues (Wikipedia, WikiHow, and technical sources). Itβs ideal for training models on long-context reasoning, multi-turn coherence, and tool-augmented dialogue across 9 languages.
π lamhieu/xllms_dialogue_pubs
π§ Designed for:
- Long-context and reasoning models
- Multilingual assistants
- Tool-calling and structured response learning
All datasets are open for research and development use β free, transparent, and carefully curated to improve dialogue model quality.
The xLLMs project is a growing suite of multilingual and multimodal dialogue datasets designed to train and evaluate advanced conversational LLMs. Each dataset focuses on a specific capability β from long-context reasoning and factual grounding to STEM explanations, math Q&A, and polite multilingual interaction.
π Explore the full collection on Hugging Face:
π lamhieu/xllms-66cdfe34307bb2edc8c6df7d
π¬ Highlight: xLLMs β Dialogue Pubs
A large-scale multilingual dataset built from document-guided synthetic dialogues (Wikipedia, WikiHow, and technical sources). Itβs ideal for training models on long-context reasoning, multi-turn coherence, and tool-augmented dialogue across 9 languages.
π lamhieu/xllms_dialogue_pubs
π§ Designed for:
- Long-context and reasoning models
- Multilingual assistants
- Tool-calling and structured response learning
All datasets are open for research and development use β free, transparent, and carefully curated to improve dialogue model quality.