kelkalot
/

medgemma-4b-it-GGUF

Model card Files Files and versions Community

kelkalot commited on May 22

Commit

adb46bb

·

verified ·

1 Parent(s): b98ce00

Update README.md

Files changed (1) hide show

README.md +58 -3

README.md CHANGED Viewed

@@ -1,3 +1,58 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+tags:
+- gguf
+- medgemma
+- gemma3
+- multimodal
+- llama.cpp
+---
+# MedGemma-4B-IT GGUF (Multimodal)
+This repository provides GGUF-formatted model files for `google/medgemma-4b-it`, designed for use with `llama.cpp`. MedGemma is a multimodal model based on Gemma-3, fine-tuned for the medical domain.
+These GGUF files allow you to run the MedGemma model locally on your CPU, or offload layers to a GPU if supported by your `llama.cpp` build (e.g., Metal on macOS, CUDA on Linux/Windows).
+**For multimodal (vision) capabilities, you MUST use both a language model GGUF file AND the provided `mmproj` (multimodal projector) GGUF file.**
+**Original Model:** [google/medgemma-4b-it](https://huggingface.co/google/medgemma-4b-it)
+## Files Provided
+Below are the GGUF files available in this repository. It is recommended to use the `F16` version of the `mmproj` file with any of the language model quantizations.
+### Language Model GGUFs:
+* **`medgemma-4b-it-F16.gguf`**:
+    * Quantization: F16 (16-bit floating point)
+    * Size: ~7.77 GB (Verify this with your actual file size)
+    * Use: Highest precision, best quality, largest file size.
+* **`medgemma-4b-it-Q8_0.gguf`**:
+    * Quantization: Q8_0
+    * Size: ~4.13 GB (Verify this with your actual file size)
+    * Use: Excellent balance between model quality and file size/performance.
+### Multimodal Projector GGUF (Required for Image Input):
+* **`mmproj-medgemma-4b-it-Q8_0.gguf`**:
+    * Quantization: Q8_0
+    * Size: ~591 MB
+    * Use: **This file is essential for image understanding.** It should be used alongside any of the language model GGUF files listed above.
+**`mmproj-medgemma-4b-it-F16.gguf`**:
+    * Quantization: F16 (Recommended precision for projector)
+    * Size: ~851 MB
+    * Use: **This file is essential for image understanding.** It should be used alongside any of the language model GGUF files listed above.
+## How to use?
+Dowload the models mmproj files.
+Install llama.cpp (https://github.com/ggml-org/llama.cpp)
+Run the server via:
+llama-server -m ~/models/medgemma-4b-it-f16.gguf --mmproj ~/models/mmproj-medgemma-4b-it-f16.gguf -c 2048 --port 8080
+Then use the model. Example usage via a visual chat: https://github.com/kelkalot/medgemma-visual-chat