SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning Paper • 2505.02363 • Published May 5 • 7
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains Paper • 2505.03981 • Published May 6 • 14
DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities Paper • 2410.07722 • Published Oct 10, 2024 • 13
SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions Paper • 1806.05258 • Published Jun 13, 2018
MultiTabQA: Generating Tabular Answers for Multi-Table Question Answering Paper • 2305.12820 • Published May 22, 2023
PARADE: Passage Representation Aggregation for Document Reranking Paper • 2008.09093 • Published Aug 20, 2020
BERT-QE: Contextualized Query Expansion for Document Re-ranking Paper • 2009.07258 • Published Sep 15, 2020
Pretrained Transformers for Text Ranking: BERT and Beyond Paper • 2010.06467 • Published Oct 13, 2020
Meta-Task Prompting Elicits Embedding from Large Language Models Paper • 2402.18458 • Published Feb 28, 2024
Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models Paper • 2310.00840 • Published Oct 2, 2023
Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models Paper • 2408.06663 • Published Aug 13, 2024 • 16
Nugget 2D: Dynamic Contextual Compression for Scaling Decoder-only Language Models Paper • 2310.02409 • Published Oct 3, 2023 • 1
Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents Paper • 2402.17896 • Published Feb 27, 2024
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation Paper • 2406.17186 • Published Jun 24, 2024 • 2