DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 8 days ago • 265
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper • 2412.21187 • Published about 1 month ago • 37
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 9 items • Updated 24 days ago • 55
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published Dec 24, 2024 • 37
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38
neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 Text Generation • Updated Oct 10, 2024 • 9.46k • 19
HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1 Sentence Similarity • Updated 27 days ago • 27.5k • 32