--- license: gemma base_model: google/medgemma-4b-it tags: - gguf - llama.cpp - quantized - q5_k_m - medical - chat library_name: llama.cpp inference: false datasets: - ruslanmv/ai-medical-chatbot language: - en pipeline_tag: image-text-to-text --- # medgemma-4b-it — medical fine-tune (5-bit GGUF) ## Model Details ## Files - `medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf` (~2.83 GB) ## How to run (llama.cpp) ```bash # Requires llama.cpp. You can run directly from the Hub path: llama-cli -m hf://sharadsnaik/medgemma-4b-it-medical-gguf/medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf -p "Hello" ``` ## How to Get Started with the Model ``` from huggingface_hub import hf_hub_download from llama_cpp import Llama p = hf_hub_download("sharadsnaik/medgemma-4b-it-medical-gguf", "medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf") llm = Llama(model_path=p, n_ctx=4096, n_threads=8, chat_format="gemma") print(llm.create_chat_completion(messages=[{"role":"user","content":"Hello"}])) ``` [More Information Needed] ## Training Details ### Training Data ``` ruslanmv/ai-medical-chatbot ``` ## Sample Code Usage: #### `app.py` ```python import os, gradio as gr from huggingface_hub import hf_hub_download from llama_cpp import Llama # Your model repo + filename REPO_ID = "sharadsnaik/medgemma-4b-it-medical-gguf" FILENAME = "medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf" # Download from Hub to local cache MODEL_PATH = hf_hub_download(repo_id=REPO_ID, filename=FILENAME, repo_type="model") # Create the llama.cpp model # Use all available CPU threads; chat_format="gemma" matches Gemma-style prompts llm = Llama( model_path=MODEL_PATH, n_ctx=4096, n_threads=os.cpu_count(), chat_format="gemma" # important for Gemma/Med-Gemma instruction formatting ) def chat_fn(message, history): # Convert Gradio history -> OpenAI-style messages messages = [] for user_msg, bot_msg in history: messages.append({"role":"user","content":user_msg}) if bot_msg: messages.append({"role":"assistant","content":bot_msg}) messages.append({"role":"user","content":message}) out = llm.create_chat_completion(messages=messages, temperature=0.6, top_p=0.95) reply = out["choices"][0]["message"]["content"] return reply demo = gr.ChatInterface(fn=chat_fn, title="MedGemma 4B (Q5_K_M) — CPU Space") if __name__ == "__main__": demo.launch() ```