DeepHermes Collection Preview models of hybrid reasoner Hermes series • 6 items • Updated 12 days ago • 27
MetaSC: Test-Time Safety Specification Optimization for Language Models Paper • 2502.07985 • Published Feb 11 • 3
Toxic Commons Collection Tools for de-toxifying public domain data, especially multilingual and historical text data and data with OCR errors. • 3 items • Updated Oct 31, 2024 • 6
steiner-preview Collection Reasoning models trained on synthetic data using reinforcement learning. • 3 items • Updated Oct 20, 2024 • 32
Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems Paper • 2410.13334 • Published Oct 17, 2024 • 13
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 139
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models Paper • 2408.03837 • Published Aug 7, 2024 • 18
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 654