Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games Paper • 2506.03610 • Published 7 days ago • 9
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates Paper • 2505.22943 • Published 13 days ago • 4
TLDR: Token-Level Detective Reward Model for Large Vision Language Models Paper • 2410.04734 • Published Oct 7, 2024 • 17
Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue Paper • 2002.07510 • Published Feb 18, 2020 • 1
Who Wrote this Code? Watermarking for Code Generation Paper • 2305.15060 • Published May 24, 2023 • 1
MPCHAT: Towards Multimodal Persona-Grounded Conversation Paper • 2305.17388 • Published May 27, 2023 • 1
TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models Paper • 2405.18027 • Published May 28, 2024 • 1