SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
Paper
• 2602.12670 • Published
• 54
None defined yet.
Are LLM Decisions Faithful to Verbal Confidence?
LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning