lbourdois commited on
Commit
a45ed11
·
verified ·
1 Parent(s): 63b8a11

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +85 -71
README.md CHANGED
@@ -1,72 +1,86 @@
1
- ---
2
- base_model: Qwen/Qwen2.5-0.5B-Instruct
3
- library_name: transformers
4
- model_name: Qwen2.5-0.5B-DPO
5
- tags:
6
- - generated_from_trainer
7
- - trl
8
- - dpo
9
- licence: license
10
- ---
11
-
12
- # Model Card for Qwen2.5-0.5B-DPO
13
-
14
- Fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) to generate YouTube titles based on my preferences. It was trained using [TRL](https://github.com/huggingface/trl).
15
-
16
- [Video link](https://youtu.be/bbVoDXoPrPM) <br>
17
- [Blog link](https://shawhin.medium.com/fine-tuning-llms-on-human-feedback-rlhf-dpo-1c693dbc4cbf) <br>
18
- [GitHub Repo](https://github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/dpo) <br>
19
- [Training Dataset](https://huggingface.co/datasets/shawhin/youtube-titles-dpo)
20
-
21
- ## Quick start
22
-
23
- ```python
24
- from transformers import pipeline
25
-
26
- video_idea = "independent component analysis intro"
27
- prompt = f"<|im_start|>user\n{video_idea}<|im_end|>\n<|im_start|>assistant\n"
28
-
29
- generator = pipeline("text-generation", model="shawhin/Qwen2.5-0.5B-DPO", device="cuda")
30
- outputs = generator(prompt, max_length=100, truncation=True, num_return_sequences=1, temperature=0.7)
31
- print(outputs[0]['generated_text'])
32
- ```
33
-
34
- ## Training procedure
35
-
36
- This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
37
-
38
- ### Framework versions
39
-
40
- - TRL: 0.15.1
41
- - Transformers: 4.48.0
42
- - Pytorch: 2.6.0
43
- - Datasets: 3.3.1
44
- - Tokenizers: 0.21.0
45
-
46
- ## Citations
47
-
48
- Cite DPO as:
49
-
50
- ```bibtex
51
- @inproceedings{rafailov2023direct,
52
- title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
53
- author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
54
- year = 2023,
55
- booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
56
- url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
57
- editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
58
- }
59
- ```
60
-
61
- Cite TRL as:
62
-
63
- ```bibtex
64
- @misc{vonwerra2022trl,
65
- title = {{TRL: Transformer Reinforcement Learning}},
66
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
67
- year = 2020,
68
- journal = {GitHub repository},
69
- publisher = {GitHub},
70
- howpublished = {\url{https://github.com/huggingface/trl}}
71
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  ```
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-0.5B-Instruct
3
+ library_name: transformers
4
+ model_name: Qwen2.5-0.5B-DPO
5
+ tags:
6
+ - generated_from_trainer
7
+ - trl
8
+ - dpo
9
+ licence: license
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ ---
25
+
26
+ # Model Card for Qwen2.5-0.5B-DPO
27
+
28
+ Fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) to generate YouTube titles based on my preferences. It was trained using [TRL](https://github.com/huggingface/trl).
29
+
30
+ [Video link](https://youtu.be/bbVoDXoPrPM) <br>
31
+ [Blog link](https://shawhin.medium.com/fine-tuning-llms-on-human-feedback-rlhf-dpo-1c693dbc4cbf) <br>
32
+ [GitHub Repo](https://github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/dpo) <br>
33
+ [Training Dataset](https://huggingface.co/datasets/shawhin/youtube-titles-dpo)
34
+
35
+ ## Quick start
36
+
37
+ ```python
38
+ from transformers import pipeline
39
+
40
+ video_idea = "independent component analysis intro"
41
+ prompt = f"<|im_start|>user\n{video_idea}<|im_end|>\n<|im_start|>assistant\n"
42
+
43
+ generator = pipeline("text-generation", model="shawhin/Qwen2.5-0.5B-DPO", device="cuda")
44
+ outputs = generator(prompt, max_length=100, truncation=True, num_return_sequences=1, temperature=0.7)
45
+ print(outputs[0]['generated_text'])
46
+ ```
47
+
48
+ ## Training procedure
49
+
50
+ This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
51
+
52
+ ### Framework versions
53
+
54
+ - TRL: 0.15.1
55
+ - Transformers: 4.48.0
56
+ - Pytorch: 2.6.0
57
+ - Datasets: 3.3.1
58
+ - Tokenizers: 0.21.0
59
+
60
+ ## Citations
61
+
62
+ Cite DPO as:
63
+
64
+ ```bibtex
65
+ @inproceedings{rafailov2023direct,
66
+ title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
67
+ author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
68
+ year = 2023,
69
+ booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
70
+ url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
71
+ editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
72
+ }
73
+ ```
74
+
75
+ Cite TRL as:
76
+
77
+ ```bibtex
78
+ @misc{vonwerra2022trl,
79
+ title = {{TRL: Transformer Reinforcement Learning}},
80
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
81
+ year = 2020,
82
+ journal = {GitHub repository},
83
+ publisher = {GitHub},
84
+ howpublished = {\url{https://github.com/huggingface/trl}}
85
+ }
86
  ```