HammerW (Hammer++++)

upvoted a paper about 21 hours ago

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

Paper • 2411.06208 • Published 4 days ago • 16

upvoted 2 papers 16 days ago

LOGO -- Long cOntext aliGnment via efficient preference Optimization

Paper • 2410.18533 • Published 20 days ago • 42

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Paper • 2410.19133 • Published 20 days ago • 11

upvoted a paper 30 days ago

Baichuan-Omni Technical Report

Paper • 2410.08565 • Published Oct 11 • 83

upvoted a paper about 2 months ago

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19 • 134

upvoted 2 papers 2 months ago

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published Sep 4 • 72

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3 • 77

upvoted an article 2 months ago

Article

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

Jun 13

• 44

upvoted a paper 3 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 117

upvoted an article 3 months ago

Article

Tool Use, Unified

Aug 12

• 62

upvoted a paper 3 months ago

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 107

upvoted an article 3 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 260

upvoted 3 papers 4 months ago

upvoted an article 4 months ago

Article

Fine-tune Llama 3 with ORPO

By

•

Apr 22

• 226

upvoted a paper 4 months ago

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published Jul 2 • 21

upvoted 3 papers 5 months ago

AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology

Paper • 2406.11912 • Published Jun 16 • 26

WPO: Enhancing RLHF with Weighted Preference Optimization

Paper • 2406.11827 • Published Jun 17 • 14

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17 • 48

Hammer++++

AI & ML interests

Organizations

HammerW's activity

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

LOGO -- Long cOntext aliGnment via efficient preference Optimization

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Baichuan-Omni Technical Report

Training Language Models to Self-Correct via Reinforcement Learning

Towards a Unified View of Preference Learning for Large Language Models: A Survey

OLMoE: Open Mixture-of-Experts Language Models

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

Building and better understanding vision-language models: insights and future directions

Tool Use, Unified

The Llama 3 Herd of Models

SmolLM - blazingly fast and remarkably powerful

Course-Correction: Safety Alignment Using Synthetic Preferences

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Towards Building Specialized Generalist AI with System 1 and System 2 Fusion

Fine-tune Llama 3 with ORPO

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology

WPO: Enhancing RLHF with Weighted Preference Optimization

DataComp-LM: In search of the next generation of training sets for language models