Sparse MPT-7B-Chat - DeepSparse

Chat-aligned MPT 7b model pruned to 50% and quantized using SparseGPT for inference with DeepSparse

from deepsparse import TextGeneration
model = TextGeneration(model="hf:neuralmagic/mpt-7b-chat-pruned50-quant")
model("Tell me a joke.", max_new_tokens=50)

Downloads last month: 28

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Space using neuralmagic/mpt-7b-chat-pruned50-quant-ds 1

Collection including neuralmagic/mpt-7b-chat-pruned50-quant-ds

DeepSparse Sparse LLMs

Collection

Useful LLMs for DeepSparse where we've pruned at least 50% of the weights! • 10 items • Updated Sep 26, 2024 • 5