Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published 8 days ago • 86
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond Paper • 2605.19660 • Published 3 days ago • 36
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions Paper • 2605.20087 • Published 3 days ago • 11
EnvFactory Collection This is the checkpoints and dataset for: EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL. • 7 items • Updated 2 days ago • 1
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Paper • 2605.18703 • Published 4 days ago • 44
Efficient RLVR Training via Weighted Mutual Information Data Selection Paper • 2603.01907 • Published Mar 2 • 14
ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall Paper • 2510.07896 • Published Oct 9, 2025 • 9
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published Feb 24 • 103
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper • 2602.17684 • Published Feb 4 • 22
Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning Paper • 2602.01745 • Published Feb 2 • 7
Improving Data and Reward Design for Scientific Reasoning in Large Language Models Paper • 2602.08321 • Published Feb 9 • 44
LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth Paper • 2602.07962 • Published Feb 8 • 24
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration Paper • 2602.01734 • Published Feb 2 • 32
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published Jan 18 • 52
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published Oct 28, 2025 • 73
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published Oct 27, 2025 • 98
QueST: Incentivizing LLMs to Generate Difficult Problems Paper • 2510.17715 • Published Oct 20, 2025 • 35
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published Oct 13, 2025 • 108
TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios Paper • 2505.12891 • Published May 19, 2025 • 10