Training in progress, step 10

Files changed (3) hide show

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ print(output["generated_text"])
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://adobesensei.wandb.io/avijitd/qwen2_5-7b-instruct-trl-dpo/runs/9szwo4hy)
 This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).

 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://adobesensei.wandb.io/avijitd/qwen2_5-7b-instruct-trl-dpo/runs/qcw77mxq)
 This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5c5dc42e4880d99c9bce4437bc0c3aaeae8a12c08ca3ad5c1b4593c6ce9ebe51
 size 10108960

 version https://git-lfs.github.com/spec/v1
+oid sha256:871fe891539e6235e9ff884e78b7a16196a87fd3034b9dbf980705c110cade7c
 size 10108960

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:349f92007000e4a05ec3c437046cf2922f3164e026859ba9a96f4f68fa4ffda2
 size 6737

 version https://git-lfs.github.com/spec/v1
+oid sha256:b0f2d14e33155e4d02d3baedbd66fcb1ed6db898854c395d7808ca5633c30dec
 size 6737