--- license: mit datasets: - DDSC/dagw_no_twitter language: - da tags: - SimCSE --- A version of the chcaa/dfm-encoder-large-v1 trained using SimCSE. It was trained as a part of the [Scandinavian Embeddings Benchmark](https://kennethenevoldsen.github.io/scandinavian-embedding-benchmark/) to establish a naive baseline for SimCSE. **Note**: We do not recommend this model, but instead encourage the user to check out the current best model on [SEB](https://kennethenevoldsen.github.io/scandinavian-embedding-benchmark/) or check out the [recommendation](https://huggingface.co/collections/danish-foundation-models/state-of-the-art-danish-models-65f01d84a10842712e186172) by the Danish Foundation Models team. ## Hyperparameters Trained using the [SimCSE](https://github.com/princeton-nlp/SimCSE) implementation with: ``` CUDA_VISIBLE_DEVICES=0 python train.py \ --train_file data/dfm_paragraphs.txt \ # paragraphs extract from Danish Gigaword --model_name_or_path chcaa/dfm-encoder-large-v1 \ --num_train_epochs 1 \ --per_device_train_batch_size 128 \ --learning_rate 1e-5 \ --max_seq_length 32 \ --evaluation_strategy steps \ --metric_for_best_model stsb_spearman \ --load_best_model_at_end \ --pooler_type cls \ --mlp_only_train \ --do_mlm \ --overwrite_output_dir \ --temp 0.05 \ --do_train \ --fp16 ``` ## Citation To cite this work please refer to the following article: ``` Enevoldsen, K., Kardos, M., Muennighoff, N., & Nielbo, K. (2024). The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding. https://openreview.net/forum?id=pJl_i7HIA72 ``` or use the following BibTeX: ``` @article{enevoldsenScandinavianEmbeddingBenchmarks2024, title = {The {Scandinavian} {Embedding} {Benchmarks}: {Comprehensive} {Assessment} of {Multilingual} and {Monolingual} {Text} {Embedding}}, shorttitle = {The {Scandinavian} {Embedding} {Benchmarks}}, url = {https://openreview.net/forum?id=pJl_i7HIA72}, language = {en}, urldate = {2024-04-12}, author = {Enevoldsen, Kenneth and Kardos, Márton and Muennighoff, Niklas and Nielbo, Kristoffer}, month = feb, year = {2024}, }