nicholasKluge
/

Aira-2-124M-DPO

Text Generation

instruction tuned

text generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Aira-2-124M-DPO / README.md

nicholasKluge's picture

Update README.md

6ce8837 12 months ago

|

2.36 kB

	---
	license: apache-2.0
	datasets:
	- nicholasKluge/reward-aira-dataset
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- text-generation-inference
	---

	# Aira-2-124M-DPO-checkpoint-200

	## Hyperparameters

	```yaml
	model_args:
	base_model: "nicholasKluge/Aira-2-124M"
	model_ref: "nicholasKluge/Aira-2-124M"
	cache_dir: null
	data_args:
	dataset_name: "nicholasKluge/reward-aira-dataset"
	dataset_split: "english"
	validation_split_percentage: null
	streaming: false
	max_prompt_length: 150
	max_length: 600
	sanity_check: false
	training_args:
	output_dir: "checkpoints"
	do_eval: false
	evaluation_strategy: "no"
	save_strategy: "steps"
	logging_strategy: "steps"
	logging_steps: 200
	max_steps: 2400
	save_steps: 200
	per_device_train_batch_size: 8
	per_device_eval_batch_size: 8
	gradient_accumulation_steps: 1
	gradient_checkpointing: false
	optim: "adamw_torch"
	learning_rate: 0.00005
	lr_scheduler_type: "cosine"
	warmup_steps: 100
	hub_token: null
	push_to_hub: false
	hub_model_id: null
	extra_args:
	project_name: "Aira-2"
	wandb_token: null
	beta: 0.8
	```

	## Logs

	\| Key \| Value \|
	\|-----------------------\|---------------------------------\|
	\| loss \| 0.2274 \|
	\| learning_rate \| 4.976714865090827e-05 \|
	\| rewards/chosen \| -33.849693298339844 \|
	\| rewards/rejected \| -114.72045135498047 \|
	\| rewards/accuracies \| 0.9768750071525574 \|
	\| rewards/margins \| 80.87075805664062 \|
	\| logps/rejected \| -404.8834228515625 \|
	\| logps/chosen \| -383.7469482421875 \|
	\| logits/rejected \| -67.6454086303711 \|
	\| logits/chosen \| -30.543472290039062 \|
	\| epoch \| 0.05 \|


	## Eval

	\| Task \|Version\| Metric \|Value \| \|Stderr\|
	\|-------------\|------:\|--------\|-----:\|---\|-----:\|
	\|arc_challenge\| 0\|acc \|0.2031\|± \|0.0118\|
	\| \| \|acc_norm\|0.2491\|± \|0.0126\|
	\|toxigen \| 0\|acc \|0.5521\|± \|0.0162\|
	\| \| \|acc_norm\|0.4340\|± \|0.0162\|
	\|truthfulqa_mc\| 1\|mc1 \|0.2485\|± \|0.0151\|
	\| \| \|mc2 \|0.4368\|± \|0.0153\|