view article Article 🐺🐦⬛ LLM Comparison/Test: Phi-4, Qwen2 VL 72B Instruct, Aya Expanse 32B in my updated MMLU-Pro CS benchmark By wolfram • 20 days ago • 4
LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context Paper • 2412.17596 • Published Dec 23, 2024 • 6