cot_encyclopedia_human_eval

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

seungone authored a paper 4 days ago

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Seongyun authored a paper 4 days ago

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Seongyun authored a paper 24 days ago

How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?

View all activity

admin2927's activity

seungone

authored a paper 4 days ago

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Paper • 2505.10185 • Published 5 days ago • 21

Seongyun

authored a paper 4 days ago

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Paper • 2505.10185 • Published 5 days ago • 21

Seongyun

authored 3 papers 24 days ago

How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?

Paper • 2410.07571 • Published Oct 10, 2024 • 2

Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators

Paper • 2503.19877 • Published Mar 25

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published 26 days ago • 109

Seongyun

updated a dataset about 1 month ago

Seongyun/human_eval_1

Viewer • Updated Apr 12 • 100 • 16

seungone

authored 2 papers about 1 month ago

M-Prometheus: A Suite of Open Multilingual LLM Judges

Paper • 2504.04953 • Published Apr 7

Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators

Paper • 2503.19877 • Published Mar 25

seungone

authored 2 papers 5 months ago

LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation

Paper • 2412.10424 • Published Dec 10, 2024 • 2

Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published Dec 19, 2024 • 9

Seongyun

authored a paper 5 months ago

Evaluating Language Models as Synthetic Data Generators

Paper • 2412.03679 • Published Dec 4, 2024 • 49

seungone

authored a paper 5 months ago

Evaluating Language Models as Synthetic Data Generators

Paper • 2412.03679 • Published Dec 4, 2024 • 49

seungone

authored 3 papers 7 months ago

MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models

Paper • 2410.17578 • Published Oct 23, 2024 • 1

Better Instruction-Following Through Minimum Bayes Risk

Paper • 2410.02902 • Published Oct 3, 2024

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

Paper • 2410.16153 • Published Oct 21, 2024 • 45

seungone

authored 2 papers 9 months ago

Consent in Crisis: The Rapid Decline of the AI Data Commons

Paper • 2407.14933 • Published Jul 20, 2024 • 12

Can Language Models Evaluate Human Written Text? Case Study on Korean Student Writing for Education

Paper • 2407.17022 • Published Jul 24, 2024

Seongyun

authored 2 papers 9 months ago

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Paper • 2406.05761 • Published Jun 9, 2024 • 3

LIQUID: A Framework for List Question Answering Dataset Generation

Paper • 2302.01691 • Published Feb 3, 2023

seungone

authored a paper 11 months ago

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Paper • 2406.05761 • Published Jun 9, 2024 • 3

AI & ML interests

Recent Activity

Team members 2

admin2927's activity