Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/T-Systems-onsite/german-roberta-sentence-transformer-v2/README.md
README.md
ADDED
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: de
|
3 |
+
license: mit
|
4 |
+
tags:
|
5 |
+
- sentence_embedding
|
6 |
+
- search
|
7 |
+
- pytorch
|
8 |
+
- xlm-roberta
|
9 |
+
- roberta
|
10 |
+
- xlm-r-distilroberta-base-paraphrase-v1
|
11 |
+
- paraphrase
|
12 |
+
datasets:
|
13 |
+
- STSbenchmark
|
14 |
+
metrics:
|
15 |
+
- Spearman’s rank correlation
|
16 |
+
- cosine similarity
|
17 |
+
---
|
18 |
+
|
19 |
+
# German RoBERTa for Sentence Embeddings V2
|
20 |
+
**The new [T-Systems-onsite/cross-en-de-roberta-sentence-transformer](https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer) model is slightly better for German language. It is also the current best model for English language and works cross-lingually. Please consider using that model.**
|
21 |
+
|
22 |
+
This model is intended to [compute sentence (text embeddings)](https://www.sbert.net/docs/usage/computing_sentence_embeddings.html) for German text. These embeddings can then be compared with [cosine-similarity](https://en.wikipedia.org/wiki/Cosine_similarity) to find sentences with a similar semantic meaning. For example this can be useful for [semantic textual similarity](https://www.sbert.net/docs/usage/semantic_textual_similarity.html), [semantic search](https://www.sbert.net/docs/usage/semantic_search.html), or [paraphrase mining](https://www.sbert.net/docs/usage/paraphrase_mining.html). To do this you have to use the [Sentence Transformers Python framework](https://github.com/UKPLab/sentence-transformers).
|
23 |
+
|
24 |
+
> Sentence-BERT (SBERT) is a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT.
|
25 |
+
|
26 |
+
Source: [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
|
27 |
+
|
28 |
+
This model is fine-tuned from [Philip May](https://eniak.de/) and open-sourced by [T-Systems-onsite](https://www.t-systems-onsite.de/). Special thanks to [Nils Reimers](https://www.nils-reimers.de/) for your awesome open-source work, the Sentence Transformers, the models and your help on GitHub.
|
29 |
+
|
30 |
+
## How to use
|
31 |
+
**The usage description above - provided by Hugging Face - is wrong for sentence embeddings! Please use this:**
|
32 |
+
|
33 |
+
To use this model install the `sentence-transformers` package (see here: <https://github.com/UKPLab/sentence-transformers>).
|
34 |
+
|
35 |
+
```python
|
36 |
+
from sentence_transformers import SentenceTransformer
|
37 |
+
model = SentenceTransformer('T-Systems-onsite/german-roberta-sentence-transformer-v2')
|
38 |
+
```
|
39 |
+
|
40 |
+
For details of usage and examples see here:
|
41 |
+
- [Computing Sentence Embeddings](https://www.sbert.net/docs/usage/computing_sentence_embeddings.html)
|
42 |
+
- [Semantic Textual Similarity](https://www.sbert.net/docs/usage/semantic_textual_similarity.html)
|
43 |
+
- [Paraphrase Mining](https://www.sbert.net/docs/usage/paraphrase_mining.html)
|
44 |
+
- [Semantic Search](https://www.sbert.net/docs/usage/semantic_search.html)
|
45 |
+
- [Cross-Encoders](https://www.sbert.net/docs/usage/cross-encoder.html)
|
46 |
+
- [Examples on GitHub](https://github.com/UKPLab/sentence-transformers/tree/master/examples)
|
47 |
+
|
48 |
+
## Training
|
49 |
+
The base model is [xlm-roberta-base](https://huggingface.co/xlm-roberta-base). This model has been further trained by [Nils Reimers](https://www.nils-reimers.de/) on a large scale paraphrase dataset for 50+ languages. [Nils Reimers](https://www.nils-reimers.de/) about this [on GitHub](https://github.com/UKPLab/sentence-transformers/issues/509#issuecomment-712243280):
|
50 |
+
|
51 |
+
>A paper is upcoming for the paraphrase models.
|
52 |
+
>
|
53 |
+
>These models were trained on various datasets with Millions of examples for paraphrases, mainly derived from Wikipedia edit logs, paraphrases mined from Wikipedia and SimpleWiki, paraphrases from news reports, AllNLI-entailment pairs with in-batch-negative loss etc.
|
54 |
+
>
|
55 |
+
>In internal tests, they perform much better than the NLI+STSb models as they have see more and broader type of training data. NLI+STSb has the issue that they are rather narrow in their domain and do not contain any domain specific words / sentences (like from chemistry, computer science, math etc.). The paraphrase models has seen plenty of sentences from various domains.
|
56 |
+
>
|
57 |
+
>More details with the setup, all the datasets, and a wider evaluation will follow soon.
|
58 |
+
|
59 |
+
The resulting model called `xlm-r-distilroberta-base-paraphrase-v1` has been released here: <https://github.com/UKPLab/sentence-transformers/releases/tag/v0.3.8>
|
60 |
+
|
61 |
+
Building on this cross language model we fine-tuned it for German language on the [deepl.com](https://www.deepl.com/translator) dataset of our [German STSbenchmark dataset](https://github.com/t-systems-on-site-services-gmbh/german-STSbenchmark).
|
62 |
+
|
63 |
+
We did an automatic hyperparameter search for 102 trials with [Optuna](https://github.com/optuna/optuna). Using 10-fold crossvalidation on the deepl.com test and dev dataset we found the following best hyperparameters:
|
64 |
+
- batch_size = 15
|
65 |
+
- num_epochs = 4
|
66 |
+
- lr = 2.2995320905210864e-05
|
67 |
+
- eps = 1.8979875906303792e-06
|
68 |
+
- weight_decay = 0.003314045812507563
|
69 |
+
- warmup_steps_proportion = 0.46141685205829014
|
70 |
+
|
71 |
+
The final model was trained with these hyperparameters on the combination of `sts_de_train.csv` and `sts_de_dev.csv`. The `sts_de_test.csv` was left for testing.
|
72 |
+
|
73 |
+
# Evaluation
|
74 |
+
The evaluation has been done on the test set of our [German STSbenchmark dataset](https://github.com/t-systems-on-site-services-gmbh/german-STSbenchmark). The code is available on [Colab](https://colab.research.google.com/drive/1aCWOqDQx953kEnQ5k4Qn7uiixokocOHv?usp=sharing). As the metric for evaluation we use the Spearman’s rank correlation between the cosine-similarity of the sentence embeddings and STSbenchmark labels.
|
75 |
+
|
76 |
+
| Model Name | Spearman rank correlation<br/>(German) |
|
77 |
+
|--------------------------------------|-------------------------------------|
|
78 |
+
| xlm-r-distilroberta-base-paraphrase-v1 | 0.8079 |
|
79 |
+
| xlm-r-100langs-bert-base-nli-stsb-mean-tokens | 0.8194 |
|
80 |
+
| xlm-r-bert-base-nli-stsb-mean-tokens | 0.8194 |
|
81 |
+
| **T-Systems-onsite/<br/>german-roberta-sentence-transformer-v2** | **0.8529** |
|
82 |
+
| **[T-Systems-onsite/<br/>cross-en-de-roberta-sentence-transformer](https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer)** | **0.8550** |
|