Pratik Bhavsar's picture

1 5 86

Pratik Bhavsar PRO

pratikbhavsar

·

https://pakodas.substack.com

AI & ML interests

LLM agents, evaluation & reasoning

Recent Activity

commented on their article 4 days ago

Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios

commented on their article 4 days ago

Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios

updated a Space 7 days ago

galileo-ai/agent-leaderboard

View all activity

Organizations

pratikbhavsar's activity

commented on Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios 4 days ago

This is excellent @ngxson .. no errors and very crisp.

commented on Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios 4 days ago

Thank you Erin! We will continue to update this further with more LLMs :)

updated a Space 7 days ago

Agent Leaderboard

Ranking of LLMs for agentic tasks

liked a dataset 9 days ago

galileo-ai/agent-leaderboard

Viewer • Updated 10 days ago • 1.28k • 312 • 14

published an article 9 days ago

Article

Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios

By

and 1 other •

9 days ago

• 12

updated a dataset 10 days ago

galileo-ai/agent-leaderboard

Viewer • Updated 10 days ago • 1.28k • 312 • 14

published a dataset 10 days ago

galileo-ai/agent-leaderboard

Viewer • Updated 10 days ago • 1.28k • 312 • 14

liked a dataset 16 days ago

gorilla-llm/Berkeley-Function-Calling-Leaderboard

Preview • Updated 7 days ago • 1.14k • 60

updated a Space 17 days ago

Agent Leaderboard

Ranking of LLMs for agentic tasks

upvoted an article 22 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

25 days ago

• 768

liked 5 datasets 22 days ago

bespokelabs/Bespoke-Stratos-17k

Viewer • Updated 22 days ago • 16.7k • 92.5k • 277

open-thoughts/OpenThoughts-114k

Viewer • Updated 1 day ago • 228k • 97.4k • 571

PrimeIntellect/NuminaMath-QwQ-CoT-5M

Viewer • Updated 30 days ago • 5.14M • 3.73k • 48

ServiceNow-AI/R1-Distill-SFT

Viewer • Updated 13 days ago • 1.85M • 6.91k • 253

cognitivecomputations/dolphin-r1

Viewer • Updated 22 days ago • 814k • 5.35k • 261

upvoted a collection 22 days ago

🧠 Reasoning datasets

Datasets with reasoning traces for math and code released by the community • 12 items • Updated 1 day ago • 77

liked a Space 23 days ago

Agent Leaderboard

Ranking of LLMs for agentic tasks

New activity in deepseek-ai/DeepSeek-R1-Distill-Qwen-32B 23 days ago

Does this have tooling support?

#7 opened about 1 month ago by

published a Space 29 days ago

Agent Leaderboard

Ranking of LLMs for agentic tasks

upvoted a collection 6 months ago

⛈️ Llama-3.1 Storm Models

Fine-tuned Llama 3.1 8B model with superior reasoning, conversation abilities, and function calling! • 3 items • Updated Aug 25, 2024 • 15