--- license: apache-2.0 datasets: - nicholasKluge/reward-aira-dataset language: - en library_name: transformers pipeline_tag: text-generation tags: - text-generation-inference --- # Aira-2-124M-DPO-checkpoint-200 ## Hyperparameters ```yaml model_args: base_model: "nicholasKluge/Aira-2-124M" model_ref: "nicholasKluge/Aira-2-124M" cache_dir: null data_args: dataset_name: "nicholasKluge/reward-aira-dataset" dataset_split: "english" validation_split_percentage: null streaming: false max_prompt_length: 150 max_length: 600 sanity_check: false training_args: output_dir: "checkpoints" do_eval: false evaluation_strategy: "no" save_strategy: "steps" logging_strategy: "steps" logging_steps: 200 max_steps: 2400 save_steps: 200 per_device_train_batch_size: 8 per_device_eval_batch_size: 8 gradient_accumulation_steps: 1 gradient_checkpointing: false optim: "adamw_torch" learning_rate: 0.00005 lr_scheduler_type: "cosine" warmup_steps: 100 hub_token: null push_to_hub: false hub_model_id: null extra_args: project_name: "Aira-2" wandb_token: null beta: 0.8 ``` ## Logs | Key | Value | |-----------------------|---------------------------------| | loss | 0.2274 | | learning_rate | 4.976714865090827e-05 | | rewards/chosen | -33.849693298339844 | | rewards/rejected | -114.72045135498047 | | rewards/accuracies | 0.9768750071525574 | | rewards/margins | 80.87075805664062 | | logps/rejected | -404.8834228515625 | | logps/chosen | -383.7469482421875 | | logits/rejected | -67.6454086303711 | | logits/chosen | -30.543472290039062 | | epoch | 0.05 | ## Eval | Task |Version| Metric |Value | |Stderr| |-------------|------:|--------|-----:|---|-----:| |arc_challenge| 0|acc |0.2031|± |0.0118| | | |acc_norm|0.2491|± |0.0126| |toxigen | 0|acc |0.5521|± |0.0162| | | |acc_norm|0.4340|± |0.0162| |truthfulqa_mc| 1|mc1 |0.2485|± |0.0151| | | |mc2 |0.4368|± |0.0153|