While Llama 3.1 is truly impressive, especially 405B (which gives GPT-4o a run for its money! πͺ)
I was surprised to see that on the Open LLM Leaderboard, Llama 3.1 70B was not able to dethrone the current king Qwen2-72B! π
Not only that, for a few benchmarks like MATH Lvl 5, it was completely lagging behind Qwen2-72B! π
Also, the benchmarks are completely off compared to the official numbers from Meta! π€―
Based on the responses, I still believe Llama 3.1 will perform better than Qwen2 on LMSYS Chatbot Arena. π€ But it still lags behind on too many benchmarks! πββοΈ
Open LLM Leaderboard: open-llm-leaderboard/open_llm_leaderboard π
Hopefully, this is just an Open LLM Leaderboard error! @open-llm-leaderboard SOS! π¨