--- license: apache-2.0 tags: - gguf - medgemma - gemma3 - multimodal - llama.cpp --- # MedGemma-4B-IT GGUF (Multimodal) This repository provides GGUF-formatted model files for `google/medgemma-4b-it`, designed for use with `llama.cpp`. MedGemma is a multimodal model based on Gemma-3, fine-tuned for the medical domain. These GGUF files allow you to run the MedGemma model locally on your CPU, or offload layers to a GPU if supported by your `llama.cpp` build (e.g., Metal on macOS, CUDA on Linux/Windows). **For multimodal (vision) capabilities, you MUST use both a language model GGUF file AND the provided `mmproj` (multimodal projector) GGUF file.** **Original Model:** [google/medgemma-4b-it](https://huggingface.co/google/medgemma-4b-it) ## Files Provided Below are the GGUF files available in this repository. It is recommended to use the `F16` version of the `mmproj` file with any of the language model quantizations. ### Language Model GGUFs: * **`medgemma-4b-it-F16.gguf`**: * Quantization: F16 (16-bit floating point) * Size: ~7.77 GB (Verify this with your actual file size) * Use: Highest precision, best quality, largest file size. * **`medgemma-4b-it-Q8_0.gguf`**: * Quantization: Q8_0 * Size: ~4.13 GB (Verify this with your actual file size) * Use: Excellent balance between model quality and file size/performance. ### Multimodal Projector GGUF (Required for Image Input): * **`mmproj-medgemma-4b-it-Q8_0.gguf`**: * Quantization: Q8_0 * Size: ~591 MB * Use: **This file is essential for image understanding.** It should be used alongside any of the language model GGUF files listed above. **`mmproj-medgemma-4b-it-F16.gguf`**: * Quantization: F16 (Recommended precision for projector) * Size: ~851 MB * Use: **This file is essential for image understanding.** It should be used alongside any of the language model GGUF files listed above. ## How to use? Dowload the models mmproj files. Install llama.cpp (https://github.com/ggml-org/llama.cpp) Run the server via: llama-server -m ~/models/medgemma-4b-it-f16.gguf --mmproj ~/models/mmproj-medgemma-4b-it-f16.gguf -c 2048 --port 8080 Then use the model. Example usage via a visual chat: https://github.com/kelkalot/medgemma-visual-chat