RLAIF/sft-gemma-2-9b-base-sft-llama-405b-instruct-correct-only-format-lr-5e-06-bs-64 Text Generation • 9B • Updated Oct 30, 2024
RLAIF/22-sequential-temp-0-verifier-oracle-in-context-train-8-w-error-masking 8B • Updated Oct 11, 2024
RLAIF/15-w-error-masking-temp-0-verifier-in-context-train-in-context-inference-8-model 8B • Updated Sep 30, 2024 • 2