Data Synthesis - a VoladorLuYu Collection

VoladorLuYu 's Collections

Research on LLM

Generative Multiple Modality

Super Alignment

Foundation Machine Learning

Graph Foundation Multimodal Models

Symbolic LLM Reasoning

Data-efficient LLMs

Understanding LLM

synthetic code generation

Diffusion Models

LLM+Architecture

LLM+Self-Play RL

Data Synthesis

updated Oct 30, 2024

Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11, 2024 • 32
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4, 2024 • 62
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 70
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Paper • 2402.13064 • Published Feb 20, 2024 • 49
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning

Paper • 2410.19290 • Published Oct 25, 2024 • 10