Clémentine Fourrier's picture

Clémentine Fourrier

clefourrier

·

http://clefourrier.github.io

AI & ML interests

None yet

Recent Activity

updated a dataset about 10 hours ago

gaia-benchmark/results_public

new activity about 12 hours ago

gaia-benchmark/leaderboard:What is Multi-Agent?

upvoted a paper about 12 hours ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

View all activity

Organizations

clefourrier's activity

published an article 4 months ago

Article

Fixing Open LLM Leaderboard with Math-Verify

By

and 3 others •

Feb 14

• 29

published an article 4 months ago

Article

The Open Arabic LLM Leaderboard 2

By

and 7 others •

Feb 10

• 32

published an article 4 months ago

Article

Open-source DeepResearch – Freeing our search agents

By

and 4 others •

Feb 4

• 1.25k

published an article 5 months ago

Article

CO₂ Emissions and Models Performance: Insights from the Open LLM Leaderboard

By

and 3 others •

Jan 9

• 21

published an article 6 months ago

Article

Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard

By

and 4 others •

Dec 4, 2024

• 36

published an article 7 months ago

Article

Introduction to the Open Leaderboard for Japanese LLMs

Nov 20, 2024

• 36

published an article 7 months ago

Article

Letting Large Models Debate: The First Multilingual LLM Debate Competition

By

and 11 others •

Nov 20, 2024

• 30

published an article 7 months ago

Article

Judge Arena: Benchmarking LLMs as Evaluators

By

and 7 others •

Nov 19, 2024

• 57

published an article 8 months ago

Article

Introducing the Open FinLLM Leaderboard

By

and 12 others •

Oct 4, 2024

• 78

published an article 12 months ago

Article

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

By

and 8 others •

Jun 18, 2024

• 47

published an article about 1 year ago

Article

Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens tokens and 11 languages

By

and 9 others •

May 24, 2024

• 27

published an article about 1 year ago

Article

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

By

and 15 others •

May 24, 2024

• 22

published an article about 1 year ago

Article

Let's talk about LLM evaluation

By

•

May 23, 2024

• 174

published an article about 1 year ago

Article

Introducing the Open Arabic LLM Leaderboard

By

and 4 others •

May 14, 2024

• 92

published an article about 1 year ago

Article

Introducing the Open Leaderboard for Hebrew LLMs!

By

and 3 others •

May 5, 2024

• 45

published an article about 1 year ago

Article

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

By

and 2 others •

May 3, 2024

• 13

published an article about 1 year ago

Article

Improving Prompt Consistency with Structured Generations

By

and 2 others •

Apr 30, 2024

• 64

published an article about 1 year ago

Article

Introducing the Open Chain of Thought Leaderboard

By

and 3 others •

Apr 23, 2024

• 34

published an article about 1 year ago

Article

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

By

and 2 others •

Apr 19, 2024

• 163

published an article about 1 year ago

Article

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

By

and 6 others •

Apr 16, 2024

• 15