GemmaECG-Vision

GemmaECG-Vision is a fine-tuned vision-language model built on google/gemma-3n-e2b, designed for ECG image interpretation tasks. The model accepts a medical ECG image along with a clinical instruction prompt and generates a structured analysis suitable for triage or documentation use cases.

This model was developed using Unsloth for efficient fine-tuning and supports image + text inputs with medical task-specific prompt formatting. It is designed to run in offline or edge environments, enabling healthcare triage in resource-constrained settings.

Model Objective

To assist healthcare professionals and emergency responders by providing AI-generated ECG analysis directly from medical images, without requiring internet access or cloud resources.

Usage

This model expects:

An ECG image (PIL.Image)
A textual instruction such as:


You are a clinical assistant specialized in ECG interpretation. Given an ECG image, generate a concise, structured, and medically accurate report.

Use this exact format:

Rhythm:
PR Interval:
QRS Duration:
Axis:
Bundle Branch Blocks:
Atrial Abnormalities:
Ventricular Hypertrophy:
Q Wave or QS Complexes:
T Wave Abnormalities:
ST Segment Changes:
Final Impression:

Inference Example (Python)

from transformers import AutoProcessor, Gemma3nForConditionalGeneration
from PIL import Image
import torch

model_id = "yasserrmd/GemmaECG-Vision"
model = Gemma3nForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.bfloat16).eval().to("cuda")
processor = AutoProcessor.from_pretrained(model_id)

image = Image.open("example_ecg.png").convert("RGB")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Interpret this ECG and provide a structured triage report."}
        ]
    }
]

prompt = processor.apply_chat_template(messages, add_generation_prompt=True)

inputs = processor(image, prompt, return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=1.0,
    top_p=0.95,
    top_k=64,
    use_cache=True
)

result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)

Training Details

Framework: Unsloth + TRL SFTTrainer
Hardware: Google Colab Pro (L4)
Batch Size: 2
Epochs: 1
Learning Rate: 2e-4
Scheduler: Cosine
Loss: CrossEntropy
Precision: bfloat16

Dataset

The training dataset is a curated subset of the PULSE-ECG/ECGInstruct dataset, reformatted for VLM instruction tuning.

3,272 samples of ECG image + structured instruction + clinical output
Focused on realistic and medically relevant triage cases

Dataset link: yasserrmd/pulse-ecg-instruct-subset

Training Loss Summary

The model was fine-tuned over 409 steps using the pulse-ecg-instruct-subset dataset. The training loss started above 9.5 and steadily declined to below 0.5, showing consistent convergence and learning throughout the single epoch. The loss curve demonstrates a stable optimization process without overfitting spikes. The chart below visualizes this progression, highlighting the model’s ability to adapt quickly to the ECG image-to-text task.

Intended Use

Emergency triage in offline settings
On-device ECG assessment
Integration with medical edge devices (Jetson, Pi, Android)
Rapid analysis during disaster response

Limitations

Not intended to replace licensed medical professionals
Accuracy may vary depending on image quality
Model outputs should be reviewed by a clinician before action

License

This model is licensed under CC BY 4.0. You are free to use, modify, and distribute it with attribution.

Author

Mohamed Yasser Hugging Face Profile

yasserrmd
/

GemmaECG-Vision