Qwen/Qwen3-30B-A3B - GGUF

This repository contains GGUF quantizations of Qwen/Qwen3-30B-A3B.

About GGUF

GGUF is a quantization method that allows you to run large language models on consumer hardware by reducing the precision of the model weights.

Files

Filename Quant type File Size Description
model-f16.gguf f16 Large Original precision
model-q4_0.gguf Q4_0 Small 4-bit quantization
model-q4_1.gguf Q4_1 Small 4-bit quantization (higher quality)
model-q5_0.gguf Q5_0 Medium 5-bit quantization
model-q5_1.gguf Q5_1 Medium 5-bit quantization (higher quality)
model-q8_0.gguf Q8_0 Large 8-bit quantization

Usage

You can use these models with llama.cpp or any other GGUF-compatible inference engine.

llama.cpp

./llama-cli -m model-q4_0.gguf -p "Your prompt here"

Python (using llama-cpp-python)

from llama_cpp import Llama

llm = Llama(model_path="model-q4_0.gguf")
output = llm("Your prompt here", max_tokens=512)
print(output['choices'][0]['text'])

Original Model

This is a quantized version of Qwen/Qwen3-30B-A3B. Please refer to the original model card for more information about the model's capabilities, training data, and usage guidelines.

Conversion Details

  • Converted using llama.cpp
  • Original model downloaded from Hugging Face
  • Multiple quantization levels provided for different use cases

License

This model inherits the license from the original model. Please check the original model's license for usage terms.

Downloads last month
0
GGUF
Model size
30.5B params
Architecture
qwen3moe
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ReallyFloppyPenguin/Qwen3-30B-A3B-GGUF

Finetuned
Qwen/Qwen3-30B-A3B
Quantized
(86)
this model

Collection including ReallyFloppyPenguin/Qwen3-30B-A3B-GGUF