kelkalot commited on
Commit
adb46bb
·
verified ·
1 Parent(s): b98ce00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -3
README.md CHANGED
@@ -1,3 +1,58 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+
4
+ tags:
5
+ - gguf
6
+ - medgemma
7
+ - gemma3
8
+ - multimodal
9
+ - llama.cpp
10
+ ---
11
+
12
+ # MedGemma-4B-IT GGUF (Multimodal)
13
+
14
+ This repository provides GGUF-formatted model files for `google/medgemma-4b-it`, designed for use with `llama.cpp`. MedGemma is a multimodal model based on Gemma-3, fine-tuned for the medical domain.
15
+
16
+ These GGUF files allow you to run the MedGemma model locally on your CPU, or offload layers to a GPU if supported by your `llama.cpp` build (e.g., Metal on macOS, CUDA on Linux/Windows).
17
+
18
+ **For multimodal (vision) capabilities, you MUST use both a language model GGUF file AND the provided `mmproj` (multimodal projector) GGUF file.**
19
+
20
+ **Original Model:** [google/medgemma-4b-it](https://huggingface.co/google/medgemma-4b-it)
21
+
22
+ ## Files Provided
23
+
24
+ Below are the GGUF files available in this repository. It is recommended to use the `F16` version of the `mmproj` file with any of the language model quantizations.
25
+
26
+ ### Language Model GGUFs:
27
+
28
+ * **`medgemma-4b-it-F16.gguf`**:
29
+ * Quantization: F16 (16-bit floating point)
30
+ * Size: ~7.77 GB (Verify this with your actual file size)
31
+ * Use: Highest precision, best quality, largest file size.
32
+ * **`medgemma-4b-it-Q8_0.gguf`**:
33
+ * Quantization: Q8_0
34
+ * Size: ~4.13 GB (Verify this with your actual file size)
35
+ * Use: Excellent balance between model quality and file size/performance.
36
+
37
+ ### Multimodal Projector GGUF (Required for Image Input):
38
+
39
+ * **`mmproj-medgemma-4b-it-Q8_0.gguf`**:
40
+ * Quantization: Q8_0
41
+ * Size: ~591 MB
42
+ * Use: **This file is essential for image understanding.** It should be used alongside any of the language model GGUF files listed above.
43
+ **`mmproj-medgemma-4b-it-F16.gguf`**:
44
+ * Quantization: F16 (Recommended precision for projector)
45
+ * Size: ~851 MB
46
+ * Use: **This file is essential for image understanding.** It should be used alongside any of the language model GGUF files listed above.
47
+
48
+ ## How to use?
49
+
50
+ Dowload the models mmproj files.
51
+
52
+ Install llama.cpp (https://github.com/ggml-org/llama.cpp)
53
+
54
+ Run the server via:
55
+ llama-server -m ~/models/medgemma-4b-it-f16.gguf --mmproj ~/models/mmproj-medgemma-4b-it-f16.gguf -c 2048 --port 8080
56
+
57
+ Then use the model. Example usage via a visual chat: https://github.com/kelkalot/medgemma-visual-chat
58
+