AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning Paper • 2511.19304 • Published Nov 24, 2025 • 90
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published Oct 27, 2025 • 121
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation Paper • 2506.14028 • Published Jun 16, 2025 • 93