---
license: gemma
base_model: google/medgemma-4b-it
tags:
- gguf
- llama.cpp
- quantized
- q5_k_m
- medical
- chat
library_name: llama.cpp
inference: false
datasets:
- ruslanmv/ai-medical-chatbot
language:
- en
pipeline_tag: image-text-to-text
---


# medgemma-4b-it — medical fine-tune (5-bit GGUF)


## Model Details

## Files
- `medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf` (~2.83 GB)

## How to run (llama.cpp)
```bash
# Requires llama.cpp. You can run directly from the Hub path:
llama-cli -m hf://sharadsnaik/medgemma-4b-it-medical-gguf/medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf -p "Hello"

```

## How to Get Started with the Model
```
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
p = hf_hub_download("sharadsnaik/medgemma-4b-it-medical-gguf",
                    "medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf")
llm = Llama(model_path=p, n_ctx=4096, n_threads=8, chat_format="gemma")
print(llm.create_chat_completion(messages=[{"role":"user","content":"Hello"}]))
```
[More Information Needed]

## Training Details

### Training Data
```
ruslanmv/ai-medical-chatbot
```

## Sample Code Usage:


#### `app.py`
```python
import os, gradio as gr
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

# Your model repo + filename
REPO_ID = "sharadsnaik/medgemma-4b-it-medical-gguf"
FILENAME = "medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf"

# Download from Hub to local cache
MODEL_PATH = hf_hub_download(repo_id=REPO_ID, filename=FILENAME, repo_type="model")

# Create the llama.cpp model
# Use all available CPU threads; chat_format="gemma" matches Gemma-style prompts
llm = Llama(
    model_path=MODEL_PATH,
    n_ctx=4096,
    n_threads=os.cpu_count(),
    chat_format="gemma"  # important for Gemma/Med-Gemma instruction formatting
)

def chat_fn(message, history):
    # Convert Gradio history -> OpenAI-style messages
    messages = []
    for user_msg, bot_msg in history:
        messages.append({"role":"user","content":user_msg})
        if bot_msg:
            messages.append({"role":"assistant","content":bot_msg})
    messages.append({"role":"user","content":message})

    out = llm.create_chat_completion(messages=messages, temperature=0.6, top_p=0.95)
    reply = out["choices"][0]["message"]["content"]
    return reply

demo = gr.ChatInterface(fn=chat_fn, title="MedGemma 4B (Q5_K_M) — CPU Space")

if __name__ == "__main__":
    demo.launch()
```