Misha Khalman's picture

1

Misha Khalman

khalman

khalman_m

AI & ML interests

LLM Reasoning and Learning from Feedback

Organizations

None yet

authored 6 papers over 1 year ago

Direct Language Model Alignment from Online AI Feedback

Paper • 2402.04792 • Published Feb 7, 2024 • 33

SLiC-HF: Sequence Likelihood Calibration with Human Feedback

Paper • 2305.10425 • Published May 17, 2023 • 5

Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 14

Calibrating Sequence likelihood Improves Conditional Language Generation

Paper • 2210.00045 • Published Sep 30, 2022 • 1

Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 46

LiPO: Listwise Preference Optimization through Learning-to-Rank

Paper • 2402.01878 • Published Feb 2, 2024 • 20