exllamav2 quantizations of TheDrummer's Behemoth-R1-123B-v2

2.25bpw h6 (32.964 GiB)(Uploading)
4.25bpw h6 (61.324 GiB)
8.00bpw h8 (114.559 GiB)
measurement.json

The 2.25bpw quant will load with 28k fp16 context on 2 24 GB GPUs, or 89k fp16 context on 3 24 GB GPUs.
The 4.25bpw quant will squeeze into 3 24GB GPUs with 16k fp16 context, but can load with 73k of fp16 context in 4 24GB GPUs.
The 8.00bpw quant requires 6 24 GB GPUs (or equivalent)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MikeRoz/Behemoth-R1-123B-v2-exl2

Base model

mistralai/Mistral-Large-Instruct-2411

Finetuned

TheDrummer/Behemoth-R1-123B-v2

Quantized

(6)

this model