Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning Paper • 2506.03136 • Published 3 days ago • 22
A Controllable Examination for Long-Context Language Models Paper • 2506.02921 • Published 3 days ago • 30
ARIA: Training Language Agents with Intention-Driven Reward Aggregation Paper • 2506.00539 • Published 7 days ago • 26
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published 12 days ago • 101
Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge Graph Paper • 2505.17507 • Published 15 days ago • 3
HuggingBench Collection Collection of data of HuggingKG and HuggingBench. • 5 items • Updated 10 days ago • 1
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders Paper • 2404.05961 • Published Apr 9, 2024 • 66