่ฐข้›†

sanaka87

AI & ML interests

Image Generation

Recent Activity

reacted to m-ric's post with ๐Ÿ‘€ about 23 hours ago
๐— ๐—ถ๐—ป๐—ถ๐— ๐—ฎ๐˜…'๐˜€ ๐—ป๐—ฒ๐˜„ ๐— ๐—ผ๐—˜ ๐—Ÿ๐—Ÿ๐—  ๐—ฟ๐—ฒ๐—ฎ๐—ฐ๐—ต๐—ฒ๐˜€ ๐—–๐—น๐—ฎ๐˜‚๐—ฑ๐—ฒ-๐—ฆ๐—ผ๐—ป๐—ป๐—ฒ๐˜ ๐—น๐—ฒ๐˜ƒ๐—ฒ๐—น ๐˜„๐—ถ๐˜๐—ต ๐Ÿฐ๐—  ๐˜๐—ผ๐—ธ๐—ฒ๐—ป๐˜€ ๐—ฐ๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜ ๐—น๐—ฒ๐—ป๐—ด๐˜๐—ต ๐Ÿ’ฅ This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach. ๐—ž๐—ฒ๐˜† ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€: ๐Ÿ—๏ธ MoE with novel hybrid attention: โ€ฃ Mixture of Experts with 456B total parameters (45.9B activated per token) โ€ฃ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers ๐Ÿ† Outperforms leading models across benchmarks while offering vastly longer context: โ€ฃ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks โ€ฃ Can efficiently handle 4M token contexts (vs 256K for most other LLMs) ๐Ÿ”ฌ Technical innovations enable efficient scaling: โ€ฃ Novel expert parallel and tensor parallel strategies cut communication overhead in half โ€ฃ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%) ๐ŸŽฏ Thorough training strategy: โ€ฃ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge! Overall, not only is the model impressive, but the technical paper is also really interesting! ๐Ÿ“ It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs. Read it in full here ๐Ÿ‘‰ https://huggingface.co/papers/2501.08313 Model here, allows commercial use <100M monthly users ๐Ÿ‘‰ https://huggingface.co/MiniMaxAI/MiniMax-Text-01
updated a model about 23 hours ago
sanaka87/3DIS
View all activity

Organizations

None yet