TDRM Collection Learning Smooth Reward Models with Temporal Difference for LLM RL and Inference • 14 items • Updated 2 days ago • 2
GLM-4.5 Collection GLM-4.5: An open-source large language model designed for intelligent agents by Z.ai • 11 items • Updated 30 days ago • 230
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31, 2024 • 51