See axolotl config

axolotl version: 0.4.1

base_model: Qwen/Qwen2.5-7B-Instruct
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: Jennny/strict_mc_label
    conversation: qwen-7b-chat
    type: sharegpt
    split: "train"
    train_on_split: "train"

warmup_ratio: 0.05
val_set_size: 0.0
output_dir: ./prm
wandb_project: preference-models
# wandb_entity: domain-generalization
wandb_watch:
wandb_name: "qwen-7b-bs32_lr2e-6_prm"
wandb_log_model:

train_on_inputs: false

save_safetensors: true
#noisy_embedding_alpha: 10.0 # default for sharegpt type
dataset_prepared_path: ~/data/preference-models/last_run_prepared

dataset_processes: 48
#torch_compile: true
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

trust_remote_code: True
adapter:
lora_model_dir:
#lora_r: 32
#lora_alpha: 16
#lora_dropout: 0.05
#lora_target_linear: true
#lora_fan_in_fan_out:

gradient_checkpointing: True

#warmup_ratio: 0.1
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
#max_steps: 10
#optimizer: adamw_torch_fused
optimizer: paged_adamw_32bit
#lr_scheduler: constant_with_warmup
lr_scheduler: cosine
learning_rate: 2.0e-6

weight_decay: 0.0
max_grad_norm: 1.0

group_by_length: false
bf16: auto
fp16: false
tf32: true

early_stopping_patience:
local_rank:
logging_steps: 2
xformers_attention:
flash_attention: true

eval_steps:
eval_table_size:
eval_table_max_new_tokens:
#save_steps: 100
save_strategy: "epoch"
save_total_limit: 4
#save_safetensors: false
debug:

ddp: #true
deepspeed: #deepspeed/zero1.json # multi-gpu only

fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>

prm

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0455

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 3
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.0296	1	3.5810
3.6289	0.0593	2	2.9033
3.6289	0.0889	3	1.3921
2.1197	0.1185	4	0.4445
2.1197	0.1481	5	0.2438
0.3612	0.1778	6	0.1210
0.3612	0.2074	7	0.0613
0.0928	0.2370	8	0.1151
0.0928	0.2667	9	0.0640
0.0827	0.2963	10	0.0762
0.0827	0.3259	11	0.0631
0.0682	0.3556	12	0.0576
0.0682	0.3852	13	0.0564
0.0509	0.4148	14	0.0546
0.0509	0.4444	15	0.0559
0.0579	0.4741	16	0.0539
0.0579	0.5037	17	0.0511
0.0509	0.5333	18	0.0535
0.0509	0.5630	19	0.0516
0.0495	0.5926	20	0.0504
0.0495	0.6222	21	0.0556
0.0509	0.6519	22	0.0559
0.0509	0.6815	23	0.0541
0.0995	0.7111	24	0.0495
0.0995	0.7407	25	0.0500
0.0473	0.7704	26	0.0502
0.0473	0.8	27	0.0503
0.0486	0.8296	28	0.0494
0.0486	0.8593	29	0.0492
0.0502	0.8889	30	0.0488
0.0502	0.9185	31	0.0493
0.071	0.9481	32	0.0483
0.071	0.9778	33	0.0477
0.0467	1.0074	34	0.0485
0.0467	1.0148	35	0.0492
0.0439	1.0444	36	0.0489
0.0439	1.0741	37	0.0483
0.0407	1.1037	38	0.0476
0.0407	1.1333	39	0.0468
0.0464	1.1630	40	0.0464
0.0464	1.1926	41	0.0460
0.0434	1.2222	42	0.0460
0.0434	1.2519	43	0.0465
0.0455	1.2815	44	0.0463
0.0455	1.3111	45	0.0461
0.048	1.3407	46	0.0460
0.048	1.3704	47	0.0459
0.0446	1.4	48	0.0458
0.0446	1.4296	49	0.0456
0.0481	1.4593	50	0.0457
0.0481	1.4889	51	0.0456
0.0432	1.5185	52	0.0456
0.0432	1.5481	53	0.0456
0.0416	1.5778	54	0.0456
0.0416	1.6074	55	0.0456
0.0424	1.6370	56	0.0455
0.0424	1.6667	57	0.0455
0.044	1.6963	58	0.0456
0.044	1.7259	59	0.0455
0.0422	1.7556	60	0.0455
0.0422	1.7852	61	0.0455
0.0419	1.8148	62	0.0455
0.0419	1.8444	63	0.0455
0.0431	1.8741	64	0.0455
0.0431	1.9037	65	0.0456
0.0396	1.9333	66	0.0455

Framework versions

Transformers 4.43.3
Pytorch 2.1.2+cu121
Datasets 2.19.1
Tokenizers 0.19.1

Jennny
/

inclusive_mc_label

prm

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Jennny/inclusive_mc_label

Evaluation results