Post
1830
Besides being the coolest named benchmark in the game, HellaSwag is an important measurement of здравый смысль (or common sense) in LLMs.
- More on HellaSwag: https://github.com/rowanz/hellaswag
I spent the afternoon benchmarking YandexGPT Pro 4th Gen, one of the Russian tech giant's premier models.
- Yandex HF Org: https://huggingface.co/yandex
- More on Yandex models: https://yandex.cloud/ru/docs/foundation-models/concepts/yandexgpt/models
The eval notebook is available on GitHub and the resulting dataset is already on the HF Hub!
- Eval Notebook: https://github.com/kghamilton89/ai-explorer/blob/main/yandex-hellaswag/hellaswag-assess.ipynb
- Eval Dataset: ZennyKenny/yandexgptpro_4th_gen-hellaswag
And of course, everyone wants to see the results so have a look at the results in the context of other zero-shot experiments that I was able to find!
- More on HellaSwag: https://github.com/rowanz/hellaswag
I spent the afternoon benchmarking YandexGPT Pro 4th Gen, one of the Russian tech giant's premier models.
- Yandex HF Org: https://huggingface.co/yandex
- More on Yandex models: https://yandex.cloud/ru/docs/foundation-models/concepts/yandexgpt/models
The eval notebook is available on GitHub and the resulting dataset is already on the HF Hub!
- Eval Notebook: https://github.com/kghamilton89/ai-explorer/blob/main/yandex-hellaswag/hellaswag-assess.ipynb
- Eval Dataset: ZennyKenny/yandexgptpro_4th_gen-hellaswag
And of course, everyone wants to see the results so have a look at the results in the context of other zero-shot experiments that I was able to find!