Text Generation
Transformers
PyTorch
Safetensors
English
gpt2
alignment
instruction tuned
text generation
conversation
assistant
dpo
text-generation-inference
Inference Endpoints
File size: 2,357 Bytes
1a8184c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1f9e9fa
1a8184c
 
6ce8837
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a8184c
 
 
 
f09d00d
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: apache-2.0
datasets:
- nicholasKluge/reward-aira-dataset
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation-inference
---

# Aira-2-124M-DPO-checkpoint-200
 
## Hyperparameters

```yaml
model_args:
  base_model: "nicholasKluge/Aira-2-124M"
  model_ref: "nicholasKluge/Aira-2-124M"
  cache_dir: null
data_args:
    dataset_name: "nicholasKluge/reward-aira-dataset"
    dataset_split: "english"
    validation_split_percentage: null
    streaming: false
    max_prompt_length: 150
    max_length: 600
    sanity_check: false
training_args:
  output_dir: "checkpoints"
  do_eval: false
  evaluation_strategy: "no"
  save_strategy: "steps"
  logging_strategy: "steps"
  logging_steps: 200
  max_steps: 2400
  save_steps: 200
  per_device_train_batch_size: 8
  per_device_eval_batch_size: 8
  gradient_accumulation_steps: 1
  gradient_checkpointing: false
  optim: "adamw_torch"
  learning_rate: 0.00005
  lr_scheduler_type: "cosine"
  warmup_steps: 100
  hub_token: null
  push_to_hub: false
  hub_model_id: null
extra_args:
  project_name: "Aira-2"
  wandb_token: null
  beta: 0.9
```

##  Logs

| Key                   | Value                           |
|-----------------------|---------------------------------|
| loss                  | 0.2274                          |
| learning_rate         | 4.976714865090827e-05           |
| rewards/chosen        | -33.849693298339844             |
| rewards/rejected      | -114.72045135498047             |
| rewards/accuracies    | 0.9768750071525574              |
| rewards/margins       | 80.87075805664062               |
| logps/rejected        | -404.8834228515625              |
| logps/chosen          | -383.7469482421875              |
| logits/rejected       | -67.6454086303711               |
| logits/chosen         | -30.543472290039062             |
| epoch                 | 0.05                            |


## Eval

|    Task     |Version| Metric |Value |   |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge|      0|acc     |0.2125|±  |0.0120|
|             |       |acc_norm|0.2466|±  |0.0126|
|toxigen      |      0|acc     |0.5479|±  |0.0162|
|             |       |acc_norm|0.4309|±  |0.0162|
|truthfulqa_mc|      1|mc1     |0.2546|±  |0.0153|
|             |       |mc2     |0.4261|±  |0.0154|