Tollef J

tollefj

AI & ML interests

Coreference resolution, span prediction, summarization, topic modeling

Recent Activity

liked a model 9 days ago
ResembleAI/chatterbox
liked a model 12 days ago
microsoft/Phi-4-reasoning
liked a model 12 days ago
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
View all activity

Organizations

Hugging Face Discord Community's profile picture

tollefj's activity

upvoted an article 3 months ago
view article
Article

Training and Finetuning Reranker Models with Sentence Transformers v4

By tomaarsen
135
view reply

Why are there so few languages involved in the training of these models? You argue that this data mix was selected "to create a corpus of European and most widely spoken languages, representing a broad range of alphabets and cultures."
But what is the relevance in other alphabets when, for example, you do not include any Nordic languages with large and high-quality datasets?

Prefixing it "Euro" seems odd in this context. You have selected a tiny fraction of languages - so name it accordingly :-)
It would also make sense to refer to EuroEval https://euroeval.com/leaderboards/