avid9 commited on
Commit
f401a81
·
verified ·
1 Parent(s): 53f31b4

Training in progress, step 10

Browse files
Files changed (3) hide show
  1. README.md +1 -1
  2. adapter_model.safetensors +1 -1
  3. training_args.bin +1 -1
README.md CHANGED
@@ -27,7 +27,7 @@ print(output["generated_text"])
27
 
28
  ## Training procedure
29
 
30
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://adobesensei.wandb.io/avijitd/qwen2_5-7b-instruct-trl-dpo/runs/9szwo4hy)
31
 
32
 
33
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
 
27
 
28
  ## Training procedure
29
 
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://adobesensei.wandb.io/avijitd/qwen2_5-7b-instruct-trl-dpo/runs/qcw77mxq)
31
 
32
 
33
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5c5dc42e4880d99c9bce4437bc0c3aaeae8a12c08ca3ad5c1b4593c6ce9ebe51
3
  size 10108960
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:871fe891539e6235e9ff884e78b7a16196a87fd3034b9dbf980705c110cade7c
3
  size 10108960
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:349f92007000e4a05ec3c437046cf2922f3164e026859ba9a96f4f68fa4ffda2
3
  size 6737
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0f2d14e33155e4d02d3baedbd66fcb1ed6db898854c395d7808ca5633c30dec
3
  size 6737