Post
271
📢 New Dataset: OpenSynth Battles
We've released OpenSynth Battles, a benchmark dataset featuring generations from five large language models on shared prompts. Each prompt includes:
Responses from:
gpt-oss-120b, deepseek-v3.1-thinking, deepseek-v3.1-instruct, moonshotai/kimi-k2-instruct, and deepseek-r1-0528
Automated scoring by gpt-oss-120b
Useful for model comparison, automated evaluation research, and prompt-level performance analysis.
No data splits included.
🔗 https://huggingface.co/datasets/ccocks-deca/open-synth-battles
We've released OpenSynth Battles, a benchmark dataset featuring generations from five large language models on shared prompts. Each prompt includes:
Responses from:
gpt-oss-120b, deepseek-v3.1-thinking, deepseek-v3.1-instruct, moonshotai/kimi-k2-instruct, and deepseek-r1-0528
Automated scoring by gpt-oss-120b
Useful for model comparison, automated evaluation research, and prompt-level performance analysis.
No data splits included.
🔗 https://huggingface.co/datasets/ccocks-deca/open-synth-battles