--- license: apache-2.0 library_name: transformers base_model: - nbeerbower/Mahou-1.5-mistral-nemo-12B-lorablated datasets: - jondurbin/gutenberg-dpo-v0.1 - nbeerbower/gutenberg2-dpo - nbeerbower/gutenberg-moderne-dpo --- ![image/png](https://huggingface.co/nbeerbower/mistral-nemo-gutenberg3-12B/resolve/main/gutenberg3.png?download=true) # mistral-nemo-gutenberg3-12B [Mahou-1.5-mistral-nemo-12B-lorablated](https://huggingface.co/nbeerbower/Mahou-1.5-mistral-nemo-12B-lorablated) finetuned on [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1), [nbeerbower/gutenberg2-dpo](https://huggingface.co/datasets/nbeerbower/gutenberg2-dpo), and [nbeerbower/gutenberg-moderne-dpo](https://huggingface.co/datasets/nbeerbower/gutenberg-moderne-dpo). ### Method [ORPO tuned](https://mlabonne.github.io/blog/posts/2024-04-19_Fine_tune_Llama_3_with_ORPO.html) with 8x A100 for 2 epochs. QLoRA config: ``` # QLoRA config bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch_dtype, bnb_4bit_use_double_quant=True, ) # LoRA config peft_config = LoraConfig( r=16, lora_alpha=32, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj'] ) ``` Training config: ``` orpo_args = ORPOConfig( run_name=new_model, learning_rate=8e-6, lr_scheduler_type="linear", max_length=4096, max_prompt_length=2048, max_completion_length=2048, beta=0.1, per_device_train_batch_size=2, per_device_eval_batch_size=2, gradient_accumulation_steps=1, optim="paged_adamw_8bit", num_train_epochs=2, evaluation_strategy="steps", eval_steps=0.2, logging_steps=1, warmup_steps=10, max_grad_norm=10, report_to="wandb", output_dir="./results/", bf16=True, gradient_checkpointing=True, ) ```