metadata

license: gemma
base_model: google/medgemma-4b-it
tags:
  - gguf
  - llama.cpp
  - quantized
  - q5_k_m
  - medical
  - chat
library_name: llama.cpp
inference: false
datasets:
  - ruslanmv/ai-medical-chatbot
language:
  - en
pipeline_tag: image-text-to-text

medgemma-4b-it — medical fine-tune (5-bit GGUF)

Model Details

Files

medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf (~2.83 GB)

How to run (llama.cpp)

# Requires llama.cpp. You can run directly from the Hub path:
llama-cli -m hf://sharadsnaik/medgemma-4b-it-medical-gguf/medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf -p "Hello"

How to Get Started with the Model

from huggingface_hub import hf_hub_download
from llama_cpp import Llama
p = hf_hub_download("sharadsnaik/medgemma-4b-it-medical-gguf",
                    "medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf")
llm = Llama(model_path=p, n_ctx=4096, n_threads=8, chat_format="gemma")
print(llm.create_chat_completion(messages=[{"role":"user","content":"Hello"}]))

[More Information Needed]

Training Details

Training Data

ruslanmv/ai-medical-chatbot

Sample Code Usage:

`app.py`

import os, gradio as gr
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

# Your model repo + filename
REPO_ID = "sharadsnaik/medgemma-4b-it-medical-gguf"
FILENAME = "medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf"

# Download from Hub to local cache
MODEL_PATH = hf_hub_download(repo_id=REPO_ID, filename=FILENAME, repo_type="model")

# Create the llama.cpp model
# Use all available CPU threads; chat_format="gemma" matches Gemma-style prompts
llm = Llama(
    model_path=MODEL_PATH,
    n_ctx=4096,
    n_threads=os.cpu_count(),
    chat_format="gemma"  # important for Gemma/Med-Gemma instruction formatting
)

def chat_fn(message, history):
    # Convert Gradio history -> OpenAI-style messages
    messages = []
    for user_msg, bot_msg in history:
        messages.append({"role":"user","content":user_msg})
        if bot_msg:
            messages.append({"role":"assistant","content":bot_msg})
    messages.append({"role":"user","content":message})

    out = llm.create_chat_completion(messages=messages, temperature=0.6, top_p=0.95)
    reply = out["choices"][0]["message"]["content"]
    return reply

demo = gr.ChatInterface(fn=chat_fn, title="MedGemma 4B (Q5_K_M) — CPU Space")

if __name__ == "__main__":
    demo.launch()