A high quality Vietnamese pretraining dataset for LLMs
			
	
	AI & ML interests
None defined yet.
			models
			0
		
			
	None public yet
			datasets
			17
		
			
	
	
	
	
	group2sealion/vnu-hard-clean
			Viewer
			• 
	
				Updated
					
				• 
			
			29.8k
	
				• 
					
					24
				
				
				
group2sealion/web_science_extract
			Viewer
			• 
	
				Updated
					
				• 
			
			11.6k
	
				• 
					
					5
				
				
				
group2sealion/qwen-gen-vnu
			Viewer
			• 
	
				Updated
					
				• 
			
			856
	
				• 
					
					9
				
				
				
group2sealion/vnu_crawl
			Viewer
			• 
	
				Updated
					
				• 
			
			42.2k
	
				• 
					
					14
				
				
				
group2sealion/15mil_milestone
			Viewer
			• 
	
				Updated
					
				• 
			
			2.43M
	
				• 
					
					5
				
				
				
group2sealion/sft_eval
			Viewer
			• 
	
				Updated
					
				• 
			
			223
	
				• 
					
					5
				
				
				
group2sealion/4mil_milestone
			Viewer
			• 
	
				Updated
					
				• 
			
			2.53M
	
				• 
					
					24
				
				
				
group2sealion/11mil_last
			Viewer
			• 
	
				Updated
					
				• 
			
			1.85M
	
				• 
					
					2
				
				
				
group2sealion/8mil_last
			Viewer
			• 
	
				Updated
					
				• 
			
			1.85M
	
				• 
					
					17
				
				
				
group2sealion/last_result
			Viewer
			• 
	
				Updated
					
				• 
			
			1.82M
	
				• 
					
					2