---
license: apache-2.0

tags:
- gguf
- medgemma
- gemma3
- multimodal
- llama.cpp
---

# MedGemma-4B-IT GGUF (Multimodal)

This repository provides GGUF-formatted model files for `google/medgemma-4b-it`, designed for use with `llama.cpp`. MedGemma is a multimodal model based on Gemma-3, fine-tuned for the medical domain.

These GGUF files allow you to run the MedGemma model locally on your CPU, or offload layers to a GPU if supported by your `llama.cpp` build (e.g., Metal on macOS, CUDA on Linux/Windows).

**For multimodal (vision) capabilities, you MUST use both a language model GGUF file AND the provided `mmproj` (multimodal projector) GGUF file.**

**Original Model:** [google/medgemma-4b-it](https://huggingface.co/google/medgemma-4b-it)

## Files Provided

Below are the GGUF files available in this repository. It is recommended to use the `F16` version of the `mmproj` file with any of the language model quantizations.

### Language Model GGUFs:

* **`medgemma-4b-it-F16.gguf`**:
    * Quantization: F16 (16-bit floating point)
    * Size: ~7.77 GB (Verify this with your actual file size)
    * Use: Highest precision, best quality, largest file size.
* **`medgemma-4b-it-Q8_0.gguf`**:
    * Quantization: Q8_0
    * Size: ~4.13 GB (Verify this with your actual file size)
    * Use: Excellent balance between model quality and file size/performance.

### Multimodal Projector GGUF (Required for Image Input):

* **`mmproj-medgemma-4b-it-Q8_0.gguf`**:
    * Quantization: Q8_0
    * Size: ~591 MB
    * Use: **This file is essential for image understanding.** It should be used alongside any of the language model GGUF files listed above.
**`mmproj-medgemma-4b-it-F16.gguf`**:
    * Quantization: F16 (Recommended precision for projector)
    * Size: ~851 MB
    * Use: **This file is essential for image understanding.** It should be used alongside any of the language model GGUF files listed above.

## How to use?

Dowload the models mmproj files. 

Install llama.cpp (https://github.com/ggml-org/llama.cpp)

Run the server via:
llama-server -m ~/models/medgemma-4b-it-f16.gguf --mmproj ~/models/mmproj-medgemma-4b-it-f16.gguf -c 2048 --port 8080

Then use the model. Example usage via a visual chat: https://github.com/kelkalot/medgemma-visual-chat