Update README.md

Browse files

Files changed (1) hide show

README.md +127 -26

README.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
-base_model: BramVanroy/GEITje-7B-ultra
 datasets:
 - BramVanroy/ultra_feedback_dutch
 library_name: peft
@@ -8,44 +9,42 @@ tags:
 - trl
 - dpo
 - generated_from_trainer
 model-index:
-- name: FinGEITje-7B-dpo
   results: []
 language:
 - nl
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/snoels/huggingface/runs/yng7mdb0)
-# FinGEITje-7B-dpo
-This model is a fine-tuned version of [/mnt/trained_models/fingeitje](https://huggingface.co//mnt/trained_models/fingeitje) on the BramVanroy/ultra_feedback_dutch dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.0279
-- Rewards/chosen: -3.8986
-- Rewards/rejected: -15.9713
-- Rewards/accuracies: 0.9836
-- Rewards/margins: 12.0727
-- Logps/rejected: -1952.6360
-- Logps/chosen: -789.0983
-- Logits/rejected: -1.7369
-- Logits/chosen: -1.8936
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -76,11 +75,113 @@ The following hyperparameters were used during training:
 | 0.0352        | 0.7962 | 600  | 0.0278          | -3.8104        | -15.6430         | 0.9836             | 11.8327         | -1919.8119     | -780.2752    | -1.7437         | -1.8978       |
 | 0.0238        | 0.9289 | 700  | 0.0279          | -3.8974        | -15.9642         | 0.9828             | 12.0668         | -1951.9310     | -788.9780    | -1.7371         | -1.8937       |
 ### Framework versions
 - PEFT 0.11.1
 - Transformers 4.42.4
 - Pytorch 2.3.1
 - Datasets 2.20.0
-- Tokenizers 0.19.1

 ---
+license: cc-by-nc-4.0
+base_model: snoels/FinGEITje-7B-sft
 datasets:
 - BramVanroy/ultra_feedback_dutch
 library_name: peft
 - trl
 - dpo
 - generated_from_trainer
+- geitje
+- fingeitje
+- dutch
+- nl
+- finance
 model-index:
+- name: snoels/FinGEITje-7B-dpo
   results: []
 language:
 - nl
+pipeline_tag: text-generation
+inference: false
 ---
 [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/snoels/huggingface/runs/yng7mdb0)
+<p align="center" style="margin:0;padding:0">
+<img src="https://huggingface.co/snoels/FinGEITje-7B-dpo/resolve/main/fingeitje-banner-dpo.png" alt="FinGEITje DPO Banner" width="1000"/>
+</p>
+<div style="margin:auto; text-align:center">
+  <h1 style="margin-bottom: 0; font-size: 2em;">🐐 FinGEITje 7B DPO</h1>
+  <em style="font-size: 1em;">A large open Dutch financial language model aligned through AI feedback.</em>
+</div>
+This model is a fine-tuned version of [snoels/FinGEITje-7B-sft](https://huggingface.co/snoels/FinGEITje-7B-sft) on the [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) dataset.
+## 📖 Model Description
+[FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo) is a large open Dutch financial language model with 7 billion parameters, based on Mistral 7B. It has been further trained using **Direct Preference Optimization (DPO)** on AI-generated preference data, aligning the model's responses with human-like preferences in the Dutch language. This alignment process enhances the model's ability to generate more helpful, coherent, and user-aligned responses in financial contexts.
+## 📊 Training
+### Training Data
+[FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo) was fine-tuned on the [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) dataset, which consists of synthetic preference data in Dutch. This dataset includes prompts along with preferred and less preferred responses, allowing the model to learn to generate more aligned responses through DPO.
 ### Training hyperparameters
 | 0.0352        | 0.7962 | 600  | 0.0278          | -3.8104        | -15.6430         | 0.9836             | 11.8327         | -1919.8119     | -780.2752    | -1.7437         | -1.8978       |
 | 0.0238        | 0.9289 | 700  | 0.0279          | -3.8974        | -15.9642         | 0.9828             | 12.0668         | -1951.9310     | -788.9780    | -1.7371         | -1.8937       |
 ### Framework versions
 - PEFT 0.11.1
 - Transformers 4.42.4
 - Pytorch 2.3.1
 - Datasets 2.20.0
+- Tokenizers 0.19.1
+## 🛠️ How to Use
+[FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo) can be utilized using the Hugging Face Transformers library along with PEFT to load the adapters efficiently.
+### Installation
+Ensure you have the necessary libraries installed:
+```bash
+pip install torch transformers peft accelerate
+```
+### Loading the Model
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from peft import PeftModel
+# Load the tokenizer
+tokenizer = AutoTokenizer.from_pretrained("BramVanroy/GEITje-7B-ultra", use_fast=False)
+# Load the base model
+base_model = AutoModelForCausalLM.from_pretrained("BramVanroy/GEITje-7B-ultra", device_map='auto')
+# Load the FinGEITje-7B-dpo model with PEFT adapters
+model = PeftModel.from_pretrained(base_model, "snoels/FinGEITje-7B-dpo", device_map='auto')
+```
+### Generating Text
+```python
+# Prepare the input
+input_text = "Wat zijn de laatste trends in de Nederlandse banksector?"
+input_ids = tokenizer.encode(input_text, return_tensors='pt').to(model.device)
+# Generate a response
+outputs = model.generate(input_ids, max_length=200, num_return_sequences=1)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+## 🙏 Acknowledgements
+We would like to thank:
+- **Rijgersberg** ([GitHub](https://github.com/Rijgersberg)) for creating [GEITje](https://github.com/Rijgersberg/GEITje), one of the first Dutch foundation models.
+- **Bram Vanroy** ([GitHub](https://github.com/BramVanroy)) for creating [GEITje-7B-ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra) and providing the ultra_feedback_dutch dataset.
+- **Contributors of the [Alignment Handbook](https://github.com/huggingface/alignment-handbook)** for providing valuable resources that guided the development and training process of [FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo).
+## 📝 Citation
+If you use [FinGEITje-7B-dpo](https://huggingface.co/snoels/FinGEITje-7B-dpo) in your work, please cite:
+```bibtex
+@article{FinGEITje2024,
+  title={A Dutch Financial Large Language Model},
+  author={Noels, Sander and De Blaere, Jorne and De Bie, Tijl},
+  journal={arXiv preprint arXiv:xxxx.xxxxx},
+  year={2024},
+  url={https://arxiv.org/abs/xxxx.xxxxx}
+}
+```
+## 📜 License
+This model is licensed under the [Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/) license.
+## 📧 Contact
+For any inquiries or questions, please contact [Sander Noels](mailto:[email protected]).
+This model is a fine-tuned version of [/mnt/trained_models/fingeitje](https://huggingface.co//mnt/trained_models/fingeitje) on the BramVanroy/ultra_feedback_dutch dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.0279
+- Rewards/chosen: -3.8986
+- Rewards/rejected: -15.9713
+- Rewards/accuracies: 0.9836
+- Rewards/margins: 12.0727
+- Logps/rejected: -1952.6360
+- Logps/chosen: -789.0983
+- Logits/rejected: -1.7369
+- Logits/chosen: -1.8936
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed