Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,58 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
|
4 |
+
tags:
|
5 |
+
- gguf
|
6 |
+
- medgemma
|
7 |
+
- gemma3
|
8 |
+
- multimodal
|
9 |
+
- llama.cpp
|
10 |
+
---
|
11 |
+
|
12 |
+
# MedGemma-4B-IT GGUF (Multimodal)
|
13 |
+
|
14 |
+
This repository provides GGUF-formatted model files for `google/medgemma-4b-it`, designed for use with `llama.cpp`. MedGemma is a multimodal model based on Gemma-3, fine-tuned for the medical domain.
|
15 |
+
|
16 |
+
These GGUF files allow you to run the MedGemma model locally on your CPU, or offload layers to a GPU if supported by your `llama.cpp` build (e.g., Metal on macOS, CUDA on Linux/Windows).
|
17 |
+
|
18 |
+
**For multimodal (vision) capabilities, you MUST use both a language model GGUF file AND the provided `mmproj` (multimodal projector) GGUF file.**
|
19 |
+
|
20 |
+
**Original Model:** [google/medgemma-4b-it](https://huggingface.co/google/medgemma-4b-it)
|
21 |
+
|
22 |
+
## Files Provided
|
23 |
+
|
24 |
+
Below are the GGUF files available in this repository. It is recommended to use the `F16` version of the `mmproj` file with any of the language model quantizations.
|
25 |
+
|
26 |
+
### Language Model GGUFs:
|
27 |
+
|
28 |
+
* **`medgemma-4b-it-F16.gguf`**:
|
29 |
+
* Quantization: F16 (16-bit floating point)
|
30 |
+
* Size: ~7.77 GB (Verify this with your actual file size)
|
31 |
+
* Use: Highest precision, best quality, largest file size.
|
32 |
+
* **`medgemma-4b-it-Q8_0.gguf`**:
|
33 |
+
* Quantization: Q8_0
|
34 |
+
* Size: ~4.13 GB (Verify this with your actual file size)
|
35 |
+
* Use: Excellent balance between model quality and file size/performance.
|
36 |
+
|
37 |
+
### Multimodal Projector GGUF (Required for Image Input):
|
38 |
+
|
39 |
+
* **`mmproj-medgemma-4b-it-Q8_0.gguf`**:
|
40 |
+
* Quantization: Q8_0
|
41 |
+
* Size: ~591 MB
|
42 |
+
* Use: **This file is essential for image understanding.** It should be used alongside any of the language model GGUF files listed above.
|
43 |
+
**`mmproj-medgemma-4b-it-F16.gguf`**:
|
44 |
+
* Quantization: F16 (Recommended precision for projector)
|
45 |
+
* Size: ~851 MB
|
46 |
+
* Use: **This file is essential for image understanding.** It should be used alongside any of the language model GGUF files listed above.
|
47 |
+
|
48 |
+
## How to use?
|
49 |
+
|
50 |
+
Dowload the models mmproj files.
|
51 |
+
|
52 |
+
Install llama.cpp (https://github.com/ggml-org/llama.cpp)
|
53 |
+
|
54 |
+
Run the server via:
|
55 |
+
llama-server -m ~/models/medgemma-4b-it-f16.gguf --mmproj ~/models/mmproj-medgemma-4b-it-f16.gguf -c 2048 --port 8080
|
56 |
+
|
57 |
+
Then use the model. Example usage via a visual chat: https://github.com/kelkalot/medgemma-visual-chat
|
58 |
+
|