Collections
Discover the best community collections!
Collections including paper arxiv:2410.01257
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 128 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 56 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 15 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 71
-
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Text Generation • 71B • Updated • 269k • • 2.05k -
nvidia/Llama-3.1-Nemotron-70B-Reward-HF
71B • Updated • 1.39k • 87 -
nvidia/HelpSteer2
Viewer • Updated • 21.4k • 1.91k • 419 -
HelpSteer2-Preference: Complementing Ratings with Preferences
Paper • 2410.01257 • Published • 25
-
KTO: Model Alignment as Prospect Theoretic Optimization
Paper • 2402.01306 • Published • 17 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 62 -
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper • 2405.14734 • Published • 11 -
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Paper • 2408.06266 • Published • 10
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 67 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 42 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 51 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 32
-
KTO: Model Alignment as Prospect Theoretic Optimization
Paper • 2402.01306 • Published • 17 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 62 -
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper • 2405.14734 • Published • 11 -
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Paper • 2408.06266 • Published • 10
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 128 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 56 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 15 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 71
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 67 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 42 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 51 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 32
-
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Text Generation • 71B • Updated • 269k • • 2.05k -
nvidia/Llama-3.1-Nemotron-70B-Reward-HF
71B • Updated • 1.39k • 87 -
nvidia/HelpSteer2
Viewer • Updated • 21.4k • 1.91k • 419 -
HelpSteer2-Preference: Complementing Ratings with Preferences
Paper • 2410.01257 • Published • 25