In these quantized versions of the model, most of the layers were shrunk to save space using MXFP4. The main difference is in the “gate” layers (ffn_gate_exps.weight) that decide which experts the model uses:

  • Q4_1 version (≈12 GB): These gate layers were made smaller and faster, but their decisions can be slightly less precise.
  • Q8_0 version (≈15 GB): These gate layers keep more detail, so the model makes more accurate choices, but the file is bigger and a bit slower.

All other layers are treated the same in both versions.

Downloads last month
382
GGUF
Model size
20.9B params
Architecture
gpt-oss
Hardware compatibility
Log In to view the estimation

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for marcelone/jinx-gpt-oss-20b-gguf

Base model

openai/gpt-oss-20b
Quantized
(4)
this model