Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
rbgo 's Collections
Finetuning
LLM-Alignment Papers
PPO Trainers
All About LLMs

LLM-Alignment Papers

updated Sep 9, 2024
Upvote
-

  • Concrete Problems in AI Safety

    Paper • 1606.06565 • Published Jun 21, 2016 • 1

  • The Off-Switch Game

    Paper • 1611.08219 • Published Nov 24, 2016 • 1

  • Learning to summarize from human feedback

    Paper • 2009.01325 • Published Sep 2, 2020 • 4

  • Truthful AI: Developing and governing AI that does not lie

    Paper • 2110.06674 • Published Oct 13, 2021 • 1

  • Scaling Laws for Neural Language Models

    Paper • 2001.08361 • Published Jan 23, 2020 • 7

  • Training language models to follow instructions with human feedback

    Paper • 2203.02155 • Published Mar 4, 2022 • 18

  • Constitutional AI: Harmlessness from AI Feedback

    Paper • 2212.08073 • Published Dec 15, 2022 • 2

  • Discovering Language Model Behaviors with Model-Written Evaluations

    Paper • 2212.09251 • Published Dec 19, 2022 • 1

  • Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

    Paper • 2406.09264 • Published Jun 13, 2024 • 2

  • Scalable AI Safety via Doubly-Efficient Debate

    Paper • 2311.14125 • Published Nov 23, 2023 • 2
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs