Orca-Llama-3-8B-Instruct-DPO

Finetuned Llama 3 8B Instruct on Intel/orca_dpo_pairs using a single 3090 24GB. Data formated using the ChatML template.

GGUF can be found here RDson/Orca-Llama-3-8B-Instruct-DPO-GGUF

ORPOConfig:

    learning_rate=1e-6,
    lr_scheduler_type="linear",
    max_length=1024,
    max_prompt_length=512,
    overwrite_output_dir=True,
    beta=0.1,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    num_train_epochs=1,
    evaluation_strategy="steps",
    eval_steps=0.2,
    logging_steps=1,
    warmup_steps=35,
    report_to="wandb",
    output_dir="./results/",
    fp16=True,
    save_steps=50
Downloads last month
11
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for RDson/Orca-Llama-3-8B-Instruct-DPO

Merges
1 model

Dataset used to train RDson/Orca-Llama-3-8B-Instruct-DPO