Commit
·
84c9c9b
1
Parent(s):
d5252e8
update claude
Browse files
auto_Mind2Web-Online - Leaderboard_data.csv
CHANGED
@@ -2,5 +2,6 @@ Agent,Model,Organization,Source,Easy,Medium,Hard,Average SR,Date
|
|
2 |
Operator,OpenAI Computer-Using Agent,OpenAI,OSU NLP,80.3,73.4,59,71.8,2025-3-22
|
3 |
SeeAct,gpt-4o-2024-08-06,OSU,OSU NLP,65.1,36.1,18.5,39.8,2025-3-22
|
4 |
Browser Use,gpt-4o-2024-08-06,Browser Use,OSU NLP,58.6,37.5,24.3,40.1,2025-3-22
|
5 |
-
Claude Computer Use,claude-3-5-sonnet-20241022,Anthropic,OSU NLP,61.9,28.1,21.2,35.8,2025-3-22
|
6 |
-
Agent-E,gpt-4o-2024-08-06,Emergence AI,OSU NLP,57.4,31.9,14.4,34.7,2025-3-22
|
|
|
|
2 |
Operator,OpenAI Computer-Using Agent,OpenAI,OSU NLP,80.3,73.4,59,71.8,2025-3-22
|
3 |
SeeAct,gpt-4o-2024-08-06,OSU,OSU NLP,65.1,36.1,18.5,39.8,2025-3-22
|
4 |
Browser Use,gpt-4o-2024-08-06,Browser Use,OSU NLP,58.6,37.5,24.3,40.1,2025-3-22
|
5 |
+
Claude Computer Use 3.5,claude-3-5-sonnet-20241022,Anthropic,OSU NLP,61.9,28.1,21.2,35.8,2025-3-22
|
6 |
+
Agent-E,gpt-4o-2024-08-06,Emergence AI,OSU NLP,57.4,31.9,14.4,34.7,2025-3-22
|
7 |
+
Claude Computer Use 3.7 (w/o thinking),Claude-3-7-sonnet-20250219,Anthropic,OSU NLP,81.5,56.2,42,59.7,2025-4-20
|
human_Mind2Web-Online - Leaderboard_data.csv
CHANGED
@@ -2,5 +2,6 @@ Agent,Model,Organization,Source,Easy,Medium,Hard,Average SR,Date
|
|
2 |
Operator,OpenAI Computer-Using Agent,OpenAI,OSU NLP,83.1,58.0,43.2,61.3,2025-3-22
|
3 |
SeeAct,gpt-4o-2024-08-06,OSU,OSU NLP,60.2,25.2,8.1,30.7,2025-3-22
|
4 |
Browser Use,gpt-4o-2024-08-06,Browser Use,OSU NLP,55.4,26.6,8.1,30.0,2025-3-22
|
5 |
-
Claude Computer Use,claude-3-5-sonnet-20241022,Anthropic,OSU NLP,56.6,20.3,14.9,29.0,2025-3-22
|
6 |
-
Agent-E,gpt-4o-2024-08-06,Emergence AI,OSU NLP,49.4,26.6,6.8,28.0,2025-3-22
|
|
|
|
2 |
Operator,OpenAI Computer-Using Agent,OpenAI,OSU NLP,83.1,58.0,43.2,61.3,2025-3-22
|
3 |
SeeAct,gpt-4o-2024-08-06,OSU,OSU NLP,60.2,25.2,8.1,30.7,2025-3-22
|
4 |
Browser Use,gpt-4o-2024-08-06,Browser Use,OSU NLP,55.4,26.6,8.1,30.0,2025-3-22
|
5 |
+
Claude Computer Use 3.5,claude-3-5-sonnet-20241022,Anthropic,OSU NLP,56.6,20.3,14.9,29.0,2025-3-22
|
6 |
+
Agent-E,gpt-4o-2024-08-06,Emergence AI,OSU NLP,49.4,26.6,6.8,28.0,2025-3-22
|
7 |
+
Claude Computer Use 3.7 (w/o thinking),Claude-3-7-sonnet-20250219,Anthropic,OSU NLP,90.4,49.0,32.4,56.3,2025-4-20
|
human_label.json
CHANGED
The diff for this file is too large to render.
See raw diff
|
|