BramVanroy
/

falcon-7b-ft-mc4_nl_cleaned_tiny

Text Generation

text-generation-inference

Model card Files Files and versions

BramVanroy commited on Jul 24, 2023

Commit

af0b0ff

·

1 Parent(s): 73b8946

Update README.md

Files changed (1) hide show

README.md +22 -15

README.md CHANGED Viewed

@@ -1,38 +1,45 @@
 ---
 license: apache-2.0
 base_model: tiiuae/falcon-7b
-tags:
-- generated_from_trainer
 datasets:
 - yhavinga/mc4_nl_cleaned
 model-index:
-- name: tiny-3e-4lr+1152tbs+1ep+0.1wd
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# tiny-3e-4lr+1152tbs+1ep+0.1wd
-This model is a fine-tuned version of [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b) on the yhavinga/mc4_nl_cleaned micro dataset.
-It achieves the following results on the evaluation set:
-- Loss: 2.0928
-## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -71,4 +78,4 @@ The following hyperparameters were used during training:
 - Transformers 4.31.0.dev0
 - Pytorch 2.0.1+cu117
 - Datasets 2.13.1
-- Tokenizers 0.13.3

 ---
 license: apache-2.0
 base_model: tiiuae/falcon-7b
 datasets:
 - yhavinga/mc4_nl_cleaned
 model-index:
+- name: falcon-7b-ft-mc4_nl_cleaned_tiny
   results: []
+language:
+- nl
+inference: false
+tags:
+- falcon
 ---
+# falcon-7b-ft-mc4_nl_cleaned_tiny
+This model is a fine-tuned version of [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b)
+on the [yhavinga/mc4_nl_cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned/viewer/tiny/train) dataset (`tiny` partition) on a context of 2048 tokens.
+See the original [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b) for more information, intended use, and biases.
 ## Intended uses & limitations
+This model is intended as a (poor) baseline for Dutch generative LLMs. It by no means aims to provide SOTA performance and is specifically intended for research purposes.
+Importantly, the original Falcon 7B model was only trained on English and French. Therefore, Dutch generations should be taken with a massive grain of salt. I
+wanted to see if the performance would be reasonable after finetuning this model on a Dutch dataset. I find that it is okay but not great. It's especially not coherent.
 ## Training and evaluation data
+Trained on the [yhavinga/mc4_nl_cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned/viewer/tiny/train) dataset (`tiny` partition) for one epoch. The canonical
+validation split was not used but instead 5% of `train` was used as validation.
+At 2048 tokens context length, the training set was around 2M (2,008,858) samples, and the model was trained for 1 epoch.
 ## Training procedure
+Trained with LoRA in 4 bit and merged before upload. The adapters are in the `adapters` branch.
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - Transformers 4.31.0.dev0
 - Pytorch 2.0.1+cu117
 - Datasets 2.13.1
+- Tokenizers 0.13.3