Offline Regularised Reinforcement Learning for Large Language Models Alignment Paper โข 2405.19107 โข Published May 29, 2024 โข 15
Running on Zero 143 143 Gemma 2 llama.cpp 2B/9B/27B ๐ป Chat with Gemma 2 for text-based conversations