Roee Aharoni

roeeaharoni

http://www.roeeaharoni.com

AI & ML interests

Natural Language Processing

Recent Activity

upvoted a paper about 1 month ago

RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation

upvoted a paper 2 months ago

Inside-Out: Hidden Factual Knowledge in LLMs

liked a dataset 10 months ago

google/granola-entity-questions

View all activity

Organizations

roeeaharoni's activity

upvoted a paper about 1 month ago

RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation

Paper • 2504.17502 • Published Apr 24 • 56

upvoted a paper 2 months ago

Inside-Out: Hidden Factual Knowledge in LLMs

Paper • 2503.15299 • Published Mar 19 • 55

liked a dataset 10 months ago

google/granola-entity-questions

Viewer • Updated Aug 1, 2024 • 12.5k • 28 • 8

reacted to gsarti's post with 🤗 over 1 year ago

Post

🔍 Today's pick in Interpretability & Analysis of LMs: A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains by @alonjacovi @yonatanbitton B. Bohnet J. Herzig @orhonovic M. Tseng M. Collins @roeeaharoni @mega

This work introduces a new methodology for human verification of reasoning chains and adopts it to annotate a dataset of chain-of-thought reasoning chains produced by 3 LMs. The annotated dataset, REVEAL, can be used to benchmark automatic verifiers of reasoning in LMs.

In their analysis, the authors find that LM-produced CoTs generally contain faulty steps, often leading to incorrect automatic verification. In particular, CoT-generating LMs are found to produce non-attributable reasoning steps often, and reasoning verifiers generally struggle to verify logical correctness.

📄 Paper: A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains (2402.00559)
🔡 Dataset: google/reveal