DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research Paper • 2505.19253 • Published May 25 • 28
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Paper • 2505.13227 • Published May 19 • 46
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning Paper • 2410.14208 • Published Oct 18, 2024 • 3
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models By loubnabnl and 2 others • Mar 20, 2024 • 99
view article Article Fine-tuning Llama 2 70B using PyTorch FSDP By smangrul and 3 others • Sep 13, 2023 • 27
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11, 2024 • 51