✨ Launched All-Scenario Reasoning Model (language, visual, and search reasoning capabilities) , with medical expertise as one of its key highlights. https://ying.baichuan-ai.com/chat
✨ Released Baichuan-M1-14B Medical LLM on the hub Available in both Base and Instruct versions, support English & Chinese.
What happened yesterday in the Chinese AI community? 🚀
T2A-01-HD 👉 https://hailuo.ai/audio MiniMax's Text-to-Audio model, now in Hailuo AI, offers 300+ voices in 17+ languages and instant emotional voice cloning.
Tare 👉 https://www.trae.ai/ A new coding tool by Bytedance for professional developers, supporting English & Chinese with free access to Claude 3.5 and GPT-4 for a limited time.
Kimi K 1.5 👉 https://github.com/MoonshotAI/Kimi-k1.5 | https://kimi.ai/ An O1-level multi-modal model by MoonShot AI, utilizing reinforcement learning with long and short-chain-of-thought and supporting up to 128k tokens.
And today…
Hunyuan 3D-2.0 👉 tencent/Hunyuan3D-2 A SoTA 3D synthesis system for high-res textured assets by Tencent Hunyuan , with open weights and code!
✨ MIT License : enabling distillation for custom models ✨ 32B & 70B models match OpenAI o1-mini in multiple capabilities ✨ API live now! Access Chain of Thought reasoning with model='deepseek-reasoner'
InternLM3-8B-instruct🔥 Trained on just 4T tokens, it outperforms Llama3.1-8B and Qwen2.5-7B in reasoning tasks, at 75% lower cost! internlm/internlm3-67875827c377690c01a9131d
✨ MiniMax-text-01: - 456B with 45.9B activated per token - Combines Lightning Attention, Softmax Attention, and MoE for optimal performance - Training context up to 1M tokens, inference handles 4M tokens
✨ MiniMax-VL-01: - ViT-MLP-LLM framework ( non-transformer👀) - Handles image inputs from 336×336 to 2016×2016 - 694M image-caption pairs + 512B tokens processed across 4 stages
MiniCPM-o2.6 🔥 an end-side multimodal LLMs released by OpenBMB from the Chinese community Model: openbmb/MiniCPM-o-2_6 ✨ Real-time English/Chinese conversation, emotion control and ASR/STT ✨ Real-time video/audio understanding ✨ Processes up to 1.8M pixels, leads OCRBench & supports 30+ languages
QvQ-72B-Preview🎄 an open weight model for visual reasoning just released by Alibaba_Qwen team Qwen/qvq-676448c820912236342b9888 ✨ Combines visual understanding & language reasoning. ✨ Scores 70.3 on MMMU ✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving
Megrez-3B-Omni 🔥 an on-device multimodal LLM by Infinigence AI, another startup emerging from the Tsinghua University ecosystem. Model: Infinigence/Megrez-3B-Omni Demo: Infinigence/Megrez-3B-Omni ✨Supports analysis of image, text, and audio modalities ✨Leads in bilingual speech ( English & Chinese ) input, multi-turn conversations, and voice-based queries ✨Outperforms in scene understanding and OCR across major benchmarks