--- library_name: onnxruntime_genai license: apache-2.0 language: - en - bn - hi - kn - gu - mr - ml - or - pa - ta - te tags: - mistral3 - indic - onnx - onnxruntime-genai - sarvam - text-generation-inference - cuda base_model_relation: quantized base_model: - sarvamai/sarvam-m --- # Sarvam-M

Chat on Sarvam Playground

# Model Information `sarvam-m` is a multilingual, hybrid-reasoning, text-only language model built on Mistral-Small. This post-trained version delivers exceptional improvements over the base model: - +20% average improvement on Indian language benchmarks - +21.6% enhancement on math benchmarks - +17.6% boost on programming benchmarks Performance gains are even more impressive at the intersection of Indian languages and mathematics, with an outstanding +86% improvement in romanized Indian language GSM-8K benchmarks. Learn more about sarvam-m in our detailed [blog post](https://www.sarvam.ai/blogs/sarvam-m). # Key Features - **Hybrid Thinking Mode**: A single versatile model supporting both "think" and "non-think" modes. Use the think mode for complex logical reasoning, mathematical problems, and coding tasks, or switch to non-think mode for efficient, general-purpose conversation. - **Advanced Indic Skills**: Specifically post-trained on Indian languages alongside English, embodying a character that authentically reflects and emphasizes Indian cultural values. - **Superior Reasoning Capabilities**: Outperforms most similarly-sized models on coding and math benchmarks, demonstrating exceptional reasoning abilities. - **Seamless Chatting Experience**: Full support for both Indic scripts and romanized versions of Indian languages, providing a smooth and accessible multilingual conversation experience. # Convertion The original model is converted to `onnx` using [OnnxRuntime-GenAI](https://github.com/microsoft/onnxruntime-genai) develop by Microsoft . # Quickstart The following code snippet demonstrates how to use `sarvam-m` using Onnx. ```python ``` > [!NOTE] > For thinking mode, we recommend `temperature=0.5`; for no-think mode, `temperature=0.2`.