Today, we spoke with Snowflake’s AI Research Team Leads, Yuxiong He and Samyam Rajbhandari (@samyam) (he is also one the researchers behind DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and
DeepSpeed-Inference (2401.08671) and other DeepSpeed papers) Collaborating with their co-authors to reduce inference costs for enterprise-specific tasks, they observed that inputs are often significantly larger than outputs. This is because it’s in the nature of enterprises to analyze enormous amounts of information trying to extract valuable insights, which are much shorter. To address this, they developed SwiftKV SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving
Model Transformation (2410.03960), an optimization that reduces LLM inference costs by up to 75% for Meta Llama LLMs, enhancing efficiency and performance in enterprise AI tasks.
Today they are open-sourcing SwiftKV (Snowflake/Llama-3.1-SwiftKV-8B-Instruct) and ArcticTrainging Platform. In our new episode "15 minutes with a Researcher" they explain how SwiftKV works, its applicability to other architectures, its limitations, and additional methods to further reduce computation costs in inference. Watch the full 15 min interview here (https://youtu.be/9x1k7eXe-6Q?si=4_HQOyi1CPHgvlrx)
Almost every AI researcher has studied or conducted a large number of AI research papers. So, it's quite logical that researchers are trying to create AI systems to help conduct research. Creating scientific research could be much easier and more varied if we use LLMs and AI assistants tailored for this purpose. Just imagine how interesting it would be to read high-quality research about AI made by an AI agent.
Today, we offer you to explore these 10 AI systems for scientific research: