Hieu Lam's picture

Hieu Lam

lamhieu

·

https://lh0x00.dev

AI & ML interests

.-.

Organizations

Posts 17

Post

2765

🚀 Introducing the xLLMs Dataset Collection

The xLLMs project is a growing suite of multilingual and multimodal dialogue datasets designed to train and evaluate advanced conversational LLMs. Each dataset focuses on a specific capability — from long-context reasoning and factual grounding to STEM explanations, math Q&A, and polite multilingual interaction.

🌍 Explore the full collection on Hugging Face:
👉 lamhieu/xllms-66cdfe34307bb2edc8c6df7d

💬 Highlight: xLLMs – Dialogue Pubs
A large-scale multilingual dataset built from document-guided synthetic dialogues (Wikipedia, WikiHow, and technical sources). It’s ideal for training models on long-context reasoning, multi-turn coherence, and tool-augmented dialogue across 9 languages.
👉 lamhieu/xllms_dialogue_pubs

🧠 Designed for:
- Long-context and reasoning models
- Multilingual assistants
- Tool-calling and structured response learning

All datasets are open for research and development use — free, transparent, and carefully curated to improve dialogue model quality.

Articles 3

Article

2

Supercharge Your Semantic Search with embs

View all Articles

Collections 5

View 5 collections

spaces 5

Lightweight Embeddings API

Generate embeddings and rerank text or images

Docsifer

Convert files to Markdown

Ghost 8B Beta (β, 128k)

Chat with your multilingual A.I. assistant.

Ghost 8B Beta (β, 8k)

Chat with your multilingual A.I. assistant.

Ghost 8b Beta Coder (Etherll)

models 1

lamhieu/distilbert-base-multilingual-cased-vietnamese-topicifier

Text Classification • 0.1B • Updated Mar 21, 2023 • 3

datasets 37

lamhieu/xllms_dialogue_greetings

Viewer • Updated Oct 19, 2025 • 41.3k • 33 • 2

lamhieu/xllms_dialogue_pubs

Viewer • Updated Oct 19, 2025 • 999k • 18 • 4

lamhieu/xllms_dialogue_wildchat

Viewer • Updated Sep 4, 2024 • 206k • 27 • 1

lamhieu/xllms_dialogue_stem

Viewer • Updated Sep 4, 2024 • 110k • 20

lamhieu/xllms_dialogue_mathqa

Viewer • Updated Sep 4, 2024 • 395k • 13

lamhieu/itorca_dpo_en

Viewer • Updated Jul 1, 2024 • 5.92k • 31 • 1

lamhieu/beyond_dpo_en

Viewer • Updated Jul 1, 2024 • 25k • 22

lamhieu/itorca_dpo_vi

Viewer • Updated Jul 1, 2024 • 12.9k • 75

lamhieu/beyond_dpo_vi

Viewer • Updated Jul 1, 2024 • 25k • 16

lamhieu/wikihow_summarize_dialogue_vi

Viewer • Updated May 17, 2024 • 6.62k • 17 • 2

View 37 datasets