Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA Paper • 2505.21115 • Published 15 days ago • 111
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published 21 days ago • 73
view article Article MIEB: The Benchmark That Stress-Tests Image-Text Embeddings Like Never Before By isaacchung and 2 others • Apr 24 • 14
USER2 Collection Universal Sentence Encoder for Russian based on RuModernBERT, with support for context lengths up to 8,192 tokens and Matryoshka representation lear • 2 items • Updated Apr 18 • 4
view article Article Training and Finetuning Reranker Models with Sentence Transformers v4 By tomaarsen • Mar 26 • 135
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation Paper • 2503.16660 • Published Mar 20 • 73
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published Mar 5 • 234
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22, 2024 • 95
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 145
Rank1: Test-Time Compute for Reranking in Information Retrieval Paper • 2502.18418 • Published Feb 25 • 27
GHOST 2.0: generative high-fidelity one shot transfer of heads Paper • 2502.18417 • Published Feb 25 • 67
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published Feb 20 • 175
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? Paper • 2502.14502 • Published Feb 20 • 91
view article Article Train 400x faster Static Embedding Models with Sentence Transformers By tomaarsen • Jan 15 • 187
The Differences Between Direct Alignment Algorithms are a Blur Paper • 2502.01237 • Published Feb 3 • 115
NanoBEIR 🍺 Collection A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11, 2024 • 17