Update README.md
Browse files
README.md
CHANGED
@@ -46,7 +46,6 @@ print(embeddings)
|
|
46 |
```
|
47 |
|
48 |
|
49 |
-
|
50 |
## Evaluation Results
|
51 |
|
52 |
I will add the model specific evaluation results once the instance is running again.
|
@@ -60,7 +59,7 @@ The model was trained with the parameters:
|
|
60 |
|
61 |
**Loss**:
|
62 |
|
63 |
-
`sentence_transformers.losses.MultipleNegativesRankingLoss
|
64 |
```
|
65 |
{'scale': 20.0, 'similarity_fct': 'cos_sim'}
|
66 |
```
|
@@ -99,6 +98,8 @@ SentenceTransformer(
|
|
99 |
|
100 |
#### Cheap Character Noise for OCR-Robust Multilingual Embeddings (introducing paper)
|
101 |
|
|
|
|
|
102 |
```bibtex
|
103 |
update once available
|
104 |
```
|
@@ -113,4 +114,25 @@ update once available
|
|
113 |
booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track},
|
114 |
pages={1393--1412},
|
115 |
year={2024}
|
116 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
```
|
47 |
|
48 |
|
|
|
49 |
## Evaluation Results
|
50 |
|
51 |
I will add the model specific evaluation results once the instance is running again.
|
|
|
59 |
|
60 |
**Loss**:
|
61 |
|
62 |
+
`sentence_transformers.losses.MultipleNegativesRankingLoss` with parameters:
|
63 |
```
|
64 |
{'scale': 20.0, 'similarity_fct': 'cos_sim'}
|
65 |
```
|
|
|
98 |
|
99 |
#### Cheap Character Noise for OCR-Robust Multilingual Embeddings (introducing paper)
|
100 |
|
101 |
+
For details on the adaptation methodology please refer to our paper (published in ACL2025 Findings). If you use our models or methodology, please cite our work.
|
102 |
+
|
103 |
```bibtex
|
104 |
update once available
|
105 |
```
|
|
|
114 |
booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track},
|
115 |
pages={1393--1412},
|
116 |
year={2024}
|
117 |
+
}
|
118 |
+
```
|
119 |
+
|
120 |
+
## About Impresso
|
121 |
+
|
122 |
+
### Impresso project
|
123 |
+
|
124 |
+
[Impresso - Media Monitoring of the Past](https://impresso-project.ch) is an interdisciplinary research project that aims to develop and consolidate tools for processing and exploring large collections of media archives across modalities, time, languages and national borders. The first project (2017-2021) was funded by the Swiss National Science Foundation under grant No. [CRSII5_173719](http://p3.snf.ch/project-173719) and the second project (2023-2027) by the SNSF under grant No. [CRSII5_213585](https://data.snf.ch/grants/grant/213585) and the Luxembourg National Research Fund under grant No. 17498891.
|
125 |
+
|
126 |
+
### Copyright
|
127 |
+
|
128 |
+
Copyright (C) 2025 The Impresso team.
|
129 |
+
|
130 |
+
### License
|
131 |
+
|
132 |
+
This program is provided as open source under the [GNU Affero General Public License](https://github.com/impresso/impresso-pyindexation/blob/master/LICENSE) v3 or later.
|
133 |
+
|
134 |
+
---
|
135 |
+
|
136 |
+
<p align="center">
|
137 |
+
<img src="https://github.com/impresso/impresso.github.io/blob/master/assets/images/3x1--Yellow-Impresso-Black-on-White--transparent.png?raw=true" width="350" alt="Impresso Project Logo"/>
|
138 |
+
</p>
|