testcase-eval

non-profit

AI & ML interests

None defined yet.

Recent Activity

yilunzhao authored a paper 2 days ago

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

yilunzhao authored a paper 26 days ago

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

Raywithyou authored a paper 26 days ago

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

View all activity

authored a paper 2 days ago

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

Paper • 2606.24551 • Published 9 days ago • 28

authored a paper 26 days ago

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

Paper • 2606.05259 • Published 28 days ago • 39

authored a paper 26 days ago

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

Paper • 2606.05259 • Published 28 days ago • 39

authored 2 papers about 1 month ago

A Survey of Reasoning-Intensive Retrieval: Progress and Challenges

Paper • 2605.00063 • Published Apr 30

OpenComputer: Verifiable Software Worlds for Computer-Use Agents

Paper • 2605.19769 • Published May 19 • 85

authored 7 papers about 2 months ago

ANCHOR: Branch-Point Data Generation for GUI Agents

Paper • 2602.07153 • Published Feb 6 • 5

P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains

Paper • 2410.09207 • Published Oct 11, 2024

Analyzing Diffusion and Autoregressive Vision Language Models in Multimodal Embedding Space

Paper • 2602.06056 • Published Jan 19

TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction

Paper • 2604.22880 • Published Apr 24 • 10

A Survey of Multimodal Mathematical Reasoning: From Perception, Alignment to Reasoning

Paper • 2603.08291 • Published Apr 14

Step-level Optimization for Efficient Computer-use Agents

Paper • 2604.27151 • Published Apr 29 • 19

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

Paper • 2605.04018 • Published May 5 • 41

submitted a paper to Daily Papers about 2 months ago

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

Paper • 2605.04018 • Published May 5 • 41

authored a paper 4 months ago

RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

Paper • 2603.09723 • Published Mar 10 • 7

authored 6 papers 5 months ago

AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research

Paper • 2507.13300 • Published Jul 17, 2025 • 20

PuzzlePlex: Benchmarking Foundation Models on Reasoning and Planning with Puzzles

Paper • 2510.06475 • Published Oct 7, 2025 • 2

MSRS: Evaluating Multi-Source Retrieval-Augmented Generation

Paper • 2508.20867 • Published Aug 28, 2025

FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering

Paper • 2510.06426 • Published Oct 7, 2025 • 3

SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing

Paper • 2506.04583 • Published Jun 5, 2025

FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents

Paper • 2411.05764 • Published Nov 8, 2024