qwen2.5-3b-expo-L2EXPO-25-3b-1
This model is a fine-tuned version of hZzy/qwen2.5-3b-sft3-25-2 on the hZzy/train_pairwise_all_new4 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4745
- Objective: 0.4683
- Reward Accuracy: 0.6228
- Logp Accuracy: 0.6055
- Log Diff Policy: 6.6940
- Chosen Logps: -92.7974
- Rejected Logps: -99.4915
- Chosen Rewards: -0.2833
- Rejected Rewards: -0.3471
- Logits: -1.2886
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 72
- total_eval_batch_size: 6
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5438 | 0.1577 | 200 | 0.5090 | 0.5094 | 0.5266 | 0.5210 | 0.9004 | -73.2070 | -74.1074 | -0.0874 | -0.0932 | -0.8639 |
0.4885 | 0.3154 | 400 | 0.4999 | 0.5020 | 0.5579 | 0.5546 | 1.9614 | -78.4734 | -80.4348 | -0.1400 | -0.1565 | -0.9730 |
0.4875 | 0.4731 | 600 | 0.4911 | 0.4929 | 0.5741 | 0.5641 | 3.4973 | -92.2967 | -95.7940 | -0.2783 | -0.3101 | -1.0447 |
0.4575 | 0.6308 | 800 | 0.4835 | 0.4798 | 0.5976 | 0.5965 | 5.5421 | -94.7561 | -100.2982 | -0.3029 | -0.3551 | -1.1891 |
0.4794 | 0.7885 | 1000 | 0.4771 | 0.4715 | 0.6139 | 0.5976 | 6.4893 | -98.6883 | -105.1776 | -0.3422 | -0.4039 | -1.2019 |
0.4268 | 0.9462 | 1200 | 0.4757 | 0.4708 | 0.6161 | 0.5965 | 6.4884 | -94.2826 | -100.7710 | -0.2981 | -0.3599 | -1.2718 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 12
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.