TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations Paper • 2505.18125 • Published May 23 • 109
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents Paper • 2410.06703 • Published Oct 9, 2024 • 2
AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation Paper • 2503.19693 • Published Mar 25 • 75
GLEE: A Unified Framework and Benchmark for Language-based Economic Environments Paper • 2410.05254 • Published Oct 7, 2024 • 85
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Paper • 2410.02707 • Published Oct 3, 2024 • 49