Vietnamese Corpus Symato/cc Updated Jul 11, 2023 • 261k • 2 Symato/c4_vi-filtered_200GB Viewer • Updated Sep 27, 2024 • 38.6M • 154 Symato/goods_vs_c4_cc_classifiers Viewer • Updated Jul 3, 2023 • 101k • 14 Symato/madlad-400_vi Viewer • Updated Sep 27, 2024 • 54.8M • 303
RAG RAG related Datasets and Tools Symato/RAG_UltraDomain Preview • Updated Sep 25, 2024 • 104 • 2 jinaai/jina-colbert-v2 0.6B • Updated Jan 17 • 94.4k • 131 Running 14 14 ContextualBench-Leaderboard 🥇 View and submit LLM benchmark evaluations samaya-ai/msmarco-w-instructions Viewer • Updated Sep 18, 2024 • 980k • 315 • 3
Visual Datasets one image is worth a thousand words TIGER-Lab/VisualWebInstruct-Seed Viewer • Updated Mar 16 • 60.3k • 637 • 18 5CD-AI/Viet-ShareGPT-4o-Text-VQA Viewer • Updated Oct 1, 2024 • 42.7k • 319 • 50 5CD-AI/Viet-LAION-Gemini-VQA Viewer • Updated Oct 3, 2024 • 844k • 71 • 45 vidore/colpali_train_set Viewer • Updated Jun 20 • 119k • 4.75k • 85
trimm_vocab Cắt bớt vocab giữ lại En Vi để model nhỏ gọn hơn, ko sản xuất tiếng Trung trong quá trình sử dụng Symato/Qwen2.5-7B-Instruct__trimm_vocab Updated Oct 21, 2024 • 2 Symato/bge-reranker-v2-m3__trimm_vocab__bf16 0.4B • Updated Oct 18, 2024 • 5 Symato/bge-m3__trimm_vocab__bf16 0.4B • Updated Oct 22, 2024 • 4 Symato/facebook_xlm-roberta-large__trimm_vocab__bf16 0.4B • Updated Oct 18, 2024 • 7
Knowledge Base Ít nhưng chất lượng Symato/KB_wikimedia Viewer • Updated Sep 27, 2024 • 1.29M • 108 • 1 Symato/wikihow_vi-en-zh Viewer • Updated Sep 27, 2024 • 9.24k • 24 • 1 Symato/KB_tve-selected-books Updated Sep 28, 2024 • 8
Vietnamese LLMs The good ones SeaLLMs/SeaLLMs-v3-7B-Chat Text Generation • 8B • Updated Sep 2, 2024 • 1.57k • • 58 CohereLabs/c4ai-command-r-plus-08-2024 Text Generation • 104B • Updated Apr 15 • 2.77k • 274 google/gemma-2-27b-it Text Generation • 27B • Updated Aug 27, 2024 • 106k • 548 Viet-Mistral/Vistral-7B-Chat Text Generation • 7B • Updated Feb 27, 2024 • 2.68k • 142
trimm_vocab Cắt bớt vocab giữ lại En Vi để model nhỏ gọn hơn, ko sản xuất tiếng Trung trong quá trình sử dụng Symato/Qwen2.5-7B-Instruct__trimm_vocab Updated Oct 21, 2024 • 2 Symato/bge-reranker-v2-m3__trimm_vocab__bf16 0.4B • Updated Oct 18, 2024 • 5 Symato/bge-m3__trimm_vocab__bf16 0.4B • Updated Oct 22, 2024 • 4 Symato/facebook_xlm-roberta-large__trimm_vocab__bf16 0.4B • Updated Oct 18, 2024 • 7
Vietnamese Corpus Symato/cc Updated Jul 11, 2023 • 261k • 2 Symato/c4_vi-filtered_200GB Viewer • Updated Sep 27, 2024 • 38.6M • 154 Symato/goods_vs_c4_cc_classifiers Viewer • Updated Jul 3, 2023 • 101k • 14 Symato/madlad-400_vi Viewer • Updated Sep 27, 2024 • 54.8M • 303
Knowledge Base Ít nhưng chất lượng Symato/KB_wikimedia Viewer • Updated Sep 27, 2024 • 1.29M • 108 • 1 Symato/wikihow_vi-en-zh Viewer • Updated Sep 27, 2024 • 9.24k • 24 • 1 Symato/KB_tve-selected-books Updated Sep 28, 2024 • 8
RAG RAG related Datasets and Tools Symato/RAG_UltraDomain Preview • Updated Sep 25, 2024 • 104 • 2 jinaai/jina-colbert-v2 0.6B • Updated Jan 17 • 94.4k • 131 Running 14 14 ContextualBench-Leaderboard 🥇 View and submit LLM benchmark evaluations samaya-ai/msmarco-w-instructions Viewer • Updated Sep 18, 2024 • 980k • 315 • 3
Vietnamese LLMs The good ones SeaLLMs/SeaLLMs-v3-7B-Chat Text Generation • 8B • Updated Sep 2, 2024 • 1.57k • • 58 CohereLabs/c4ai-command-r-plus-08-2024 Text Generation • 104B • Updated Apr 15 • 2.77k • 274 google/gemma-2-27b-it Text Generation • 27B • Updated Aug 27, 2024 • 106k • 548 Viet-Mistral/Vistral-7B-Chat Text Generation • 7B • Updated Feb 27, 2024 • 2.68k • 142
Visual Datasets one image is worth a thousand words TIGER-Lab/VisualWebInstruct-Seed Viewer • Updated Mar 16 • 60.3k • 637 • 18 5CD-AI/Viet-ShareGPT-4o-Text-VQA Viewer • Updated Oct 1, 2024 • 42.7k • 319 • 50 5CD-AI/Viet-LAION-Gemini-VQA Viewer • Updated Oct 3, 2024 • 844k • 71 • 45 vidore/colpali_train_set Viewer • Updated Jun 20 • 119k • 4.75k • 85