BrianGITS/Gemma3-DeepMed-Small-4B

Building the Future of AI-Driven Healthcare Through Open Innovation

Gemma3-DeepMed-Small-4B is a fine-tuned version of Google's Gemma3-4B, designed to enhance medical reasoning capabilities through cutting-edge optimisation techniques. Leveraging methodologies from DeepSeek R1, this model demonstrates strong logical inference and structured decision-making in diverse medical scenarios.

🧠 Advanced Training Techniques: Training follows a two-stage optimisation process, combining Supervised Fine-Tuning (SFT) with Group Relative Policy Optimisation (GRPO):

Supervised Fine-Tuning (SFT): The model is first trained on a carefully curated set of high-quality medical reasoning examples (cold-start data), ensuring strong foundational knowledge.
Reasoning-Focused RL: Using GRPO, the model refines its ability to analyse, infer, and generalise, improving its diagnostic accuracy and medical decision-making.

💡 Efficient & Scalable AI for Embedded Systems: As part of our commitment to bringing AI-powered healthcare closer to patients, enabling models with fewer parameters to reason marks a key step in our progress toward integrating System 2 thinking AI models into the microcontrollers of our medical devices. This advancement brings us closer to our goal of facilitating real-time processing of sensor data and delivering actionable health insights, even in resource-limited environments.

🚀 Expanding Our Vision: Gemma3-DeepMed-27B (Coming Soon) The advancements seen in this compact model have mobilised us to fine-tune Gemma 3 27B, creating a more powerful medical reasoning AI. This model is not indented to be served on our medical devices, but act as an improved medical QA chatbot for users on our mobile app.

This larger model will be trained on function calling data, making it more suitable for agentic applications within the healthcare domain.
Google’s Gemma 3 architecture, which supports 140+ languages, makes it an ideal choice for enhancing accessibility.

🌍 Commitment to Open AI & Research: We believe in the power of open-source AI to drive innovation, transparency, and accessibility in medical technology. By refining reasoning capabilities in smaller models, we aim to lower the barrier to accessing AI-driven healthcare, ensuring that advanced medical intelligence is available even in low-resource settings.

⚙️ Release Details:

Model Size: 4 billion parameters
Quantization: This model can be quantized by loading it through various libraries like Unsloth
Language(s) (NLP): 140+
Developed By: Brian Karuga from Tulivu AI
Terms of Use: Gemma use
Fine-tuned from Models: Gemma-3-4B-it

Gemma3-DeepMed-Small-4B can be fine-tuned for more specialised medical tasks and datasets, allowing researchers and developers to adapt it for specific healthcare applications. We are thrilled to share Gemma3-DeepMed-Small-4B with the global research community, empowering developers, educators, and healthcare innovators to advance medical AI and drive meaningful impact in patient care. Btw, more of the same is on the way!

Use with Unsloth Important: Please use the exact chat template provided by Gemma-3 instruct version. Otherwise there will be a degradation in the performance. The model output can be verbose in rare cases.

See the snippet below for usage:

from unsloth import FastModel
import torch

model, tokenizer = FastModel.from_pretrained(
    model_name="BrianGITS/Gemma3-DeepMed-Small-4B",
    max_seq_length=2048,  # Choose any for long context!
    load_in_4bit=False,  # 4-bit quantization to reduce memory
    load_in_8bit=False,  # 8-bit quantization for better accuracy
    full_finetuning=False,  # Full finetuning enabled
)

prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}"""

question = "Your question goes here"

FastModel.for_inference(model)  # Optimized inference for speed

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=2048,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

Training Hyperparameters (Click to expand)

learning_rate: 5e-4
lr_scheduler: linear
train_batch_size: 12
eval_batch_size: 8
GPU: L4 24GB
optimizer: mw_8bit
lr_scheduler_warmup_steps: 100
num_epochs: 4
weight_decay: 0.01

Peft Hyperparameters (Click to expand)

adapter: qlora
lora_r: 64
lora_alpha: 64
lora_dropout: 0.05
lora_target_linear: true
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- down_proj
- up_proj

Training results: GRPO

Advisory Notice!

While DeepMed-Small-4B leverages high-quality data sources, its outputs may still contain inaccuracies, biases, or misalignments that could pose risks if relied upon for medical decision-making without further testing and refinement. The model's performance has not yet been rigorously evaluated in randomised controlled trials or real-world healthcare environments. Therefore, we strongly advise against using DeepMed-Small-4B for any direct patient care, clinical decision support, or other professional medical purposes at this time. Its use should be limited to research, development, and exploratory applications by qualified individuals who understand its limitations. DeepMed-Small-4B is intended solely as a research tool to assist healthcare professionals and should never be considered a replacement for the professional judgment and expertise of a qualified medical doctor.

Appropriately adapting and validating DeepMed-Small-4B for specific medical use cases would require significant additional work, potentially including:

• Thorough testing and evaluation in relevant clinical scenarios

• Alignment with evidence-based guidelines and best practices

• Mitigation of potential biases and failure modes

• Integration with human oversight and interpretation

• Compliance with regulatory and ethical standards

Always consult a qualified healthcare provider for personal medical needs.

Citation

If you find Tulivu-DeepMed-Small-4b useful in your work, please cite the model as follows:

@misc{ Tulivu-DeepMed,
  author = {Brian Karuga},
  title = { Tulivu-DeepMed: Building the Future of AI-Driven Healthcare Through Open Innovation},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\url{ https://huggingface.co/BrianGITS/Gemma3-DeepMed-Small-4B}}
}

💌 Contact

We look forward to hearing you and collaborating on this exciting project!

Contributors:

Brian Karuga [brian dot karuga at strathmore dot edu]
Tulivu AI

References

We thank the Google Team for their amazing models!

Result sources

• [1] GPT-4 [Capabilities of GPT-4 on Medical Challenge Problems] (https://arxiv.org/abs/2303.13375)

• [2] Med-PaLM-1 Large Language Models Encode Clinical Knowledge

• [3] Med-PaLM-2 Towards Expert-Level Medical Question Answering with Large Language Models

• [4] Gemini-1.0 Gemini Goes to Med School

BrianGITS
/

Gemma3-DeepMed-Small-4B

Model tree for BrianGITS/Gemma3-DeepMed-Small-4B