snoels commited on
Commit
ccf9c27
β€’
1 Parent(s): e513ba2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -26
README.md CHANGED
@@ -1,5 +1,6 @@
1
  ---
2
- base_model: BramVanroy/GEITje-7B-ultra
 
3
  datasets:
4
  - BramVanroy/ultra_feedback_dutch
5
  library_name: peft
@@ -8,44 +9,42 @@ tags:
8
  - trl
9
  - dpo
10
  - generated_from_trainer
 
 
 
 
 
11
  model-index:
12
- - name: FinGEITje-7B-dpo
13
  results: []
14
  language:
15
  - nl
 
 
16
  ---
17
 
18
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
- should probably proofread and complete it, then remove this comment. -->
20
-
21
  [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/snoels/huggingface/runs/yng7mdb0)
22
- # FinGEITje-7B-dpo
23
 
24
- This model is a fine-tuned version of [/mnt/trained_models/fingeitje](https://huggingface.co//mnt/trained_models/fingeitje) on the BramVanroy/ultra_feedback_dutch dataset.
25
- It achieves the following results on the evaluation set:
26
- - Loss: 0.0279
27
- - Rewards/chosen: -3.8986
28
- - Rewards/rejected: -15.9713
29
- - Rewards/accuracies: 0.9836
30
- - Rewards/margins: 12.0727
31
- - Logps/rejected: -1952.6360
32
- - Logps/chosen: -789.0983
33
- - Logits/rejected: -1.7369
34
- - Logits/chosen: -1.8936
35
 
36
- ## Model description
 
 
 
37
 
38
- More information needed
39
 
40
- ## Intended uses & limitations
41
 
42
- More information needed
43
 
44
- ## Training and evaluation data
45
 
46
- More information needed
47
 
48
- ## Training procedure
49
 
50
  ### Training hyperparameters
51
 
@@ -76,11 +75,113 @@ The following hyperparameters were used during training:
76
  | 0.0352 | 0.7962 | 600 | 0.0278 | -3.8104 | -15.6430 | 0.9836 | 11.8327 | -1919.8119 | -780.2752 | -1.7437 | -1.8978 |
77
  | 0.0238 | 0.9289 | 700 | 0.0279 | -3.8974 | -15.9642 | 0.9828 | 12.0668 | -1951.9310 | -788.9780 | -1.7371 | -1.8937 |
78
 
79
-
80
  ### Framework versions
81
 
82
  - PEFT 0.11.1
83
  - Transformers 4.42.4
84
  - Pytorch 2.3.1
85
  - Datasets 2.20.0
86
- - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-nc-4.0
3
+ base_model: snoels/FinGEITje-7B-sft
4
  datasets:
5
  - BramVanroy/ultra_feedback_dutch
6
  library_name: peft
 
9
  - trl
10
  - dpo
11
  - generated_from_trainer
12
+ - geitje
13
+ - fingeitje
14
+ - dutch
15
+ - nl
16
+ - finance
17
  model-index:
18
+ - name: snoels/FinGEITje-7B-dpo
19
  results: []
20
  language:
21
  - nl
22
+ pipeline_tag: text-generation
23
+ inference: false
24
  ---
25
 
 
 
 
26
  [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/snoels/huggingface/runs/yng7mdb0)
 
27
 
28
+ <p align="center" style="margin:0;padding:0">
29
+ <img src="https://huggingface.co/snoels/FinGEITje-7B-dpo/resolve/main/fingeitje-banner-dpo.png" alt="FinGEITje DPO Banner" width="1000"/>
30
+ </p>
 
 
 
 
 
 
 
 
31
 
32
+ <div style="margin:auto; text-align:center">
33
+ <h1 style="margin-bottom: 0; font-size: 2em;">🐐 FinGEITje 7B DPO</h1>
34
+ <em style="font-size: 1em;">A large open Dutch financial language model aligned through AI feedback.</em>
35
+ </div>
36
 
37
+ This model is a fine-tuned version of [snoels/FinGEITje-7B-sft](https://huggingface.co/snoels/FinGEITje-7B-sft) on the [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) dataset.
38
 
39
+ ## πŸ“– Model Description
40
 
41
+ [FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo) is a large open Dutch financial language model with 7 billion parameters, based on Mistral 7B. It has been further trained using **Direct Preference Optimization (DPO)** on AI-generated preference data, aligning the model's responses with human-like preferences in the Dutch language. This alignment process enhances the model's ability to generate more helpful, coherent, and user-aligned responses in financial contexts.
42
 
43
+ ## πŸ“Š Training
44
 
45
+ ### Training Data
46
 
47
+ [FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo) was fine-tuned on the [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) dataset, which consists of synthetic preference data in Dutch. This dataset includes prompts along with preferred and less preferred responses, allowing the model to learn to generate more aligned responses through DPO.
48
 
49
  ### Training hyperparameters
50
 
 
75
  | 0.0352 | 0.7962 | 600 | 0.0278 | -3.8104 | -15.6430 | 0.9836 | 11.8327 | -1919.8119 | -780.2752 | -1.7437 | -1.8978 |
76
  | 0.0238 | 0.9289 | 700 | 0.0279 | -3.8974 | -15.9642 | 0.9828 | 12.0668 | -1951.9310 | -788.9780 | -1.7371 | -1.8937 |
77
 
 
78
  ### Framework versions
79
 
80
  - PEFT 0.11.1
81
  - Transformers 4.42.4
82
  - Pytorch 2.3.1
83
  - Datasets 2.20.0
84
+ - Tokenizers 0.19.1
85
+
86
+ ## πŸ› οΈ How to Use
87
+
88
+ [FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo) can be utilized using the Hugging Face Transformers library along with PEFT to load the adapters efficiently.
89
+
90
+ ### Installation
91
+
92
+ Ensure you have the necessary libraries installed:
93
+
94
+ ```bash
95
+ pip install torch transformers peft accelerate
96
+ ```
97
+
98
+ ### Loading the Model
99
+
100
+ ```python
101
+ from transformers import AutoTokenizer, AutoModelForCausalLM
102
+ from peft import PeftModel
103
+
104
+ # Load the tokenizer
105
+ tokenizer = AutoTokenizer.from_pretrained("BramVanroy/GEITje-7B-ultra", use_fast=False)
106
+
107
+ # Load the base model
108
+ base_model = AutoModelForCausalLM.from_pretrained("BramVanroy/GEITje-7B-ultra", device_map='auto')
109
+
110
+ # Load the FinGEITje-7B-dpo model with PEFT adapters
111
+ model = PeftModel.from_pretrained(base_model, "snoels/FinGEITje-7B-dpo", device_map='auto')
112
+ ```
113
+
114
+ ### Generating Text
115
+
116
+ ```python
117
+ # Prepare the input
118
+ input_text = "Wat zijn de laatste trends in de Nederlandse banksector?"
119
+ input_ids = tokenizer.encode(input_text, return_tensors='pt').to(model.device)
120
+
121
+ # Generate a response
122
+ outputs = model.generate(input_ids, max_length=200, num_return_sequences=1)
123
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
124
+
125
+ print(response)
126
+ ```
127
+
128
+ ## πŸ™ Acknowledgements
129
+
130
+ We would like to thank:
131
+
132
+ - **Rijgersberg** ([GitHub](https://github.com/Rijgersberg)) for creating [GEITje](https://github.com/Rijgersberg/GEITje), one of the first Dutch foundation models.
133
+ - **Bram Vanroy** ([GitHub](https://github.com/BramVanroy)) for creating [GEITje-7B-ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra) and providing the ultra_feedback_dutch dataset.
134
+ - **Contributors of the [Alignment Handbook](https://github.com/huggingface/alignment-handbook)** for providing valuable resources that guided the development and training process of [FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo).
135
+
136
+ ## πŸ“ Citation
137
+
138
+ If you use [FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo) in your work, please cite:
139
+
140
+ ```bibtex
141
+ @article{FinGEITje2024,
142
+ title={A Dutch Financial Large Language Model},
143
+ author={Noels, Sander and De Blaere, Jorne and De Bie, Tijl},
144
+ journal={arXiv preprint arXiv:xxxx.xxxxx},
145
+ year={2024},
146
+ url={https://arxiv.org/abs/xxxx.xxxxx}
147
+ }
148
+ ```
149
+
150
+ ## πŸ“œ License
151
+
152
+ This model is licensed under the [Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/) license.
153
+
154
+ ## πŸ“§ Contact
155
+
156
+ For any inquiries or questions, please contact [Sander Noels](mailto:[email protected]).
157
+
158
+
159
+
160
+
161
+
162
+
163
+
164
+ This model is a fine-tuned version of [/mnt/trained_models/fingeitje](https://huggingface.co//mnt/trained_models/fingeitje) on the BramVanroy/ultra_feedback_dutch dataset.
165
+ It achieves the following results on the evaluation set:
166
+ - Loss: 0.0279
167
+ - Rewards/chosen: -3.8986
168
+ - Rewards/rejected: -15.9713
169
+ - Rewards/accuracies: 0.9836
170
+ - Rewards/margins: 12.0727
171
+ - Logps/rejected: -1952.6360
172
+ - Logps/chosen: -789.0983
173
+ - Logits/rejected: -1.7369
174
+ - Logits/chosen: -1.8936
175
+
176
+ ## Model description
177
+
178
+ More information needed
179
+
180
+ ## Intended uses & limitations
181
+
182
+ More information needed
183
+
184
+ ## Training and evaluation data
185
+
186
+ More information needed
187
+