--- license: apache-2.0 quantized_by: Pomni language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su base_model: - openai/whisper-large-v2 pipeline_tag: automatic-speech-recognition tags: - whisper.cpp - ggml - whisper - audio - speech - voice new_version: Pomni/whisper-large-v3-ggml-allquants --- # Whisper-Large-v2 quants This is a repository of **GGML quants for [whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)**, for use with [whisper.cpp](https://github.com/ggml-org/whisper.cpp). If you are looking for a program to run this model with, then I would recommend [EasyWhisper UI](https://github.com/mehtabmahir/easy-whisper-ui), as it is user-friendly, has a GUI, and will automate a lot of the hard stuff for you. ## List of Quants Clicking on a link will download the corresponding quant instantly. | Link | Quant | Size | Notes |:-----|:-----|--------:|:------| | [GGML](https://huggingface.co/Pomni/whisper-large-v2-ggml-allquants/resolve/main/ggml-large-v2-f32.bin) | F32 | 6.17 GB | Likely overkill. | | [GGML](https://huggingface.co/Pomni/whisper-large-v2-ggml-allquants/resolve/main/ggml-large-v2-f16.bin) | F16 | 3.09 GB | Performs better than Q8_0 for noisy audio and music. | | [GGML](https://huggingface.co/Pomni/whisper-large-v2-ggml-allquants/resolve/main/ggml-large-v2-q8_0.bin) | Q8_0 | 1.66 GB | Sweet spot; superficial quality loss at nearly double the speed. | | [GGML](https://huggingface.co/Pomni/whisper-large-v2-ggml-allquants/resolve/main/ggml-large-v2-q6_k.bin) | Q6_K | 1.28 GB | | | [GGML](https://huggingface.co/Pomni/whisper-large-v2-ggml-allquants/resolve/main/ggml-large-v2-q5_k.bin) | Q5_K | 1.08 GB | | | [GGML](https://huggingface.co/Pomni/whisper-large-v2-ggml-allquants/resolve/main/ggml-large-v2-q5_1.bin) | Q5_1 | 1.18 GB | | | [GGML](https://huggingface.co/Pomni/whisper-large-v2-ggml-allquants/resolve/main/ggml-large-v2-q5_0.bin) | Q5_0 | 1.08 GB | Last "good" quant; anything below loses quality rapidly. | | [GGML](https://huggingface.co/Pomni/whisper-large-v2-ggml-allquants/resolve/main/ggml-large-v2-q4_k.bin) | Q4_K | 889 MB | *Might* not have lost too much quality, but I'm not sure. | | [GGML](https://huggingface.co/Pomni/whisper-large-v2-ggml-allquants/resolve/main/ggml-large-v2-q4_1.bin) | Q4_1 | 985 MB | | | [GGML](https://huggingface.co/Pomni/whisper-large-v2-ggml-allquants/resolve/main/ggml-large-v2-q4_0.bin) | Q4_0 | 889 MB | | | [GGML](https://huggingface.co/Pomni/whisper-large-v2-ggml-allquants/resolve/main/ggml-large-v2-q3_k.bin) | Q3_K | 685 MB | | | [GGML](https://huggingface.co/Pomni/whisper-large-v2-ggml-allquants/resolve/main/ggml-large-v2-q2_k.bin) | Q2_K | 529 MB | Completely non-sensical outputs. | The F16 quant was taken from [ggerganov/whisper.cpp/ggml-large-v2.bin](https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-large-v2.bin). ## Questions you may have ### Why do the "K-quants" not work for me? My guess is that your GPU might be too old to recognize them, considering that I have gotten the same error on my GTX 1080. If you would like to run them regardless, you can try switching to CPU inference. ### Are the K-quants "S", "M", or "L"? The quantizer I was using was not specific about this, so I do not know about this either. ### What program did you use to make these quants? I used [whisper.cpp v1.7.6](https://github.com/ggml-org/whisper.cpp/releases/tag/v1.7.6) on Windows x64, leveraging CUDA 12.4.0. For the F32 quant, I converted the original Hugging Face (H5) format model to a GGML using the `models/convert-h5-to-ggml.py` script. ### One or multiple of the quants are not working for me. [Open a new discussion](https://huggingface.co/Pomni/whisper-large-v2-ggml-allquants/discussions/new) in the community tab about this, and I will look into the issue.