Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
11
13
54
Marc Olejak
MarcGrumpyOlejak
Follow
everton137's profile picture
John6666's profile picture
Mi6paulino's profile picture
6 followers
·
45 following
AI & ML interests
On the practical low-cost level of ML playing around with german bureaucratic language and still uses Levenshtein.
Recent Activity
reacted
to
tomaarsen
's
post
with 🔥
10 days ago
🐦🔥 I've just published Sentence Transformers v5.2.0! It introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support and more. Details: - CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just `device=["cuda:0", "cuda:1"]` or `device=["cpu"]*4` on the `model.predict` or `model.rank` calls. - Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing `dataset_id`, e.g. `dataset_id="lightonai/NanoBEIR-de"` for the German benchmark. - Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass `output_scores=True` to get similarity scores returned. This can be useful for some distillation losses! - Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet! - Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it. Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0 I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!
liked
a model
16 days ago
dbmdz/bert-base-german-uncased
liked
a dataset
about 1 month ago
PleIAs/SYNTH
View all activity
Organizations
None yet
MarcGrumpyOlejak
's datasets
12
Sort: Recently updated
MarcGrumpyOlejak/gooaq_mt_german
Viewer
•
Updated
Nov 24
•
3.01M
•
27
MarcGrumpyOlejak/LCC_deu_news_1M_bt
Viewer
•
Updated
Aug 6
•
17.4M
•
53
MarcGrumpyOlejak/gooaq_mt_german_0_hard_negatives
Viewer
•
Updated
Jul 30
•
623k
•
18
MarcGrumpyOlejak/gooaq_mt_german_5_hard_negatives
Viewer
•
Updated
Jul 30
•
2.08M
•
218
MarcGrumpyOlejak/mmarco-de-distilled-scored
Viewer
•
Updated
Jun 13
•
315k
•
7
MarcGrumpyOlejak/germanrag-scored
Viewer
•
Updated
Jun 10
•
3.36k
•
9
MarcGrumpyOlejak/german-oasst1-qa-format-scored
Viewer
•
Updated
Jun 10
•
10.4k
•
15
MarcGrumpyOlejak/swim-ir-monolingual-de-scored
Viewer
•
Updated
Jun 2
•
447k
•
19
MarcGrumpyOlejak/slimorca_dedup_german_experimental-scored
Viewer
•
Updated
Jun 2
•
322k
•
25
MarcGrumpyOlejak/gpt-4-self-instruct-german-scored
Viewer
•
Updated
Jun 2
•
10k
•
12
MarcGrumpyOlejak/ultradistil-intel-orca-dpo-de-scored
Viewer
•
Updated
Jun 2
•
5.92k
•
13
MarcGrumpyOlejak/alpaca-gpt4_de-scored
Viewer
•
Updated
Jun 2
•
50k
•
10