Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
ccocks-decaΒ 
posted an update Sep 3
Post
277
πŸ“’ New Dataset: OpenSynth Battles

We've released OpenSynth Battles, a benchmark dataset featuring generations from five large language models on shared prompts. Each prompt includes:

Responses from:
gpt-oss-120b, deepseek-v3.1-thinking, deepseek-v3.1-instruct, moonshotai/kimi-k2-instruct, and deepseek-r1-0528

Automated scoring by gpt-oss-120b

Useful for model comparison, automated evaluation research, and prompt-level performance analysis.
No data splits included.

πŸ”— https://huggingface.co/datasets/ccocks-deca/open-synth-battles
In this post