Update README.md
Browse files
README.md
CHANGED
@@ -11,16 +11,13 @@ licence: license
|
|
11 |
|
12 |
# Model Card for Qwen2.5-0.5B-DPO
|
13 |
|
14 |
-
Fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) to generate YouTube titles based on my preferences.
|
15 |
|
16 |
Video link: coming soon! <br>
|
17 |
[Blog link](https://shawhin.medium.com/fine-tuning-llms-on-human-feedback-rlhf-dpo-1c693dbc4cbf) <br>
|
18 |
[GitHub Repo](https://github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/dpo) <br>
|
19 |
[Training Dataset](https://huggingface.co/datasets/shawhin/youtube-titles-dpo)
|
20 |
|
21 |
-
This model is a .
|
22 |
-
It has been trained using [TRL](https://github.com/huggingface/trl).
|
23 |
-
|
24 |
## Quick start
|
25 |
|
26 |
```python
|
|
|
11 |
|
12 |
# Model Card for Qwen2.5-0.5B-DPO
|
13 |
|
14 |
+
Fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) to generate YouTube titles based on my preferences. It was trained using [TRL](https://github.com/huggingface/trl).
|
15 |
|
16 |
Video link: coming soon! <br>
|
17 |
[Blog link](https://shawhin.medium.com/fine-tuning-llms-on-human-feedback-rlhf-dpo-1c693dbc4cbf) <br>
|
18 |
[GitHub Repo](https://github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/dpo) <br>
|
19 |
[Training Dataset](https://huggingface.co/datasets/shawhin/youtube-titles-dpo)
|
20 |
|
|
|
|
|
|
|
21 |
## Quick start
|
22 |
|
23 |
```python
|