chhatramani
/

Gemma3n_Radiology_v1

+---
+license: cc-by-sa-4.0
+tags:
+- unsloth
+datasets:
+- unsloth/Radiology_mini
+language:
+- en
+base_model:
+- unsloth/gemma-3n-E2B-it
+pipeline_tag: visual-question-answering
+---
+# chhatramani/Gemma3n_Radiology_v1
+## Fine-Tuned Gemma 3N for Medical VQA on ROCOv2
+This repository hosts `chhatramani/Gemma3n_Radiology_v1`, a vision-language model (VLM) fine-tuned on the ROCOv2 radiography dataset for Medical Visual Question Answering (VQA). This model leverages the latest **Gemma 3N** architecture from Google's Gemmaverse, with both its vision and language components fine-tuned for improved performance in the medical domain.
+**Please Note:** This model was developed as an experimental project for research and educational purposes only. It is not intended for clinical use or to provide medical advice. Always consult with qualified medical professionals for diagnosis and treatment.
+## Model Description
+`chhatramani/Gemma3n_Radiology_v1` is built upon the powerful `unsloth/gemma-3n-E2B-it` base model. It has undergone parameter-efficient fine-tuning (PEFT) using LoRA adapters, specifically targeting both the vision and language layers, including attention and MLP modules. This approach allows for efficient training by updating only a small percentage of the model's parameters while achieving significant performance gains.
+The fine-tuning process aimed to transform a general-purpose VLM into a specialized tool for medical professionals, capable of analyzing medical images (X-rays, CT scans, ultrasounds) and understanding expert-written captions describing medical conditions and diseases.
+## Training Details
+The model was fine-tuned using the following key technologies and methodologies:
+* **Base Model:** `unsloth/gemma-3n-E2B-it`
+* **Fine-tuning Framework:** `UnSloth`, `HuggingFace`, `TRL`
+* **PEFT Method:** LoRA (Low-Rank Adaptation)
+    * `finetune_vision_layers`: `True`
+    * `finetune_language_layers`: `True`
+    * `finetune_attention_modules`: `True`
+    * `finetune_mlp_modules`: `True`
+    * `r`: 16
+    * `lora_alpha`: 16
+    * `lora_dropout`: 0
+    * `bias`: "none"
+    * `random_state`: 3407
+    * `use_rslora`: `False`
+    * `loftq_config`: `None`
+    * `target_modules`: "all-linear"
+    * `modules_to_save`: `["lm_head", "embed_tokens"]`
+* **Dataset:** A sampled version of the [ROCO radiography dataset](https://huggingface.co/datasets/unsloth/Radiology_mini). The full dataset is available [here](https://huggingface.co/datasets/eltorio/ROCOv2-radiology). The dataset consists of medical images (X-rays, CT scans, ultrasounds) paired with expert-written captions.
+### Installation (for local reproduction/usage)
+To use or reproduce the training of this model, you will need to install the necessary libraries. It is recommended to use `UnSloth` for optimized performance.
+```bash
+# For Colab notebooks (or similar environments)
+!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo
+!pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" huggingface_hub hf_transfer
+!pip install --no-deps unsloth
+# Install latest transformers and timm for Gemma 3N compatibility
+!pip install --no-deps transformers==4.53.1
+!pip install --no-deps --upgrade timm
+Usage (Inference)
+To use this model for inference, you can load it directly from Hugging Face Transformers.
+from unsloth import FastVisionModel
+from transformers import AutoProcessor
+import torch
+from PIL import Image
+# Load the model and processor
+model, processor = FastVisionModel.from_pretrained(
+    "chhatramani/Gemma3n_Radiology_v1",
+    load_in_4bit = True, # Use 4bit for inference to reduce memory use
+)
+# Example Usage:
+# You can replace this with your own medical image and question
+image_path = "path/to/your/medical_image.jpg" # Replace with an actual image path
+image = Image.open(image_path).convert("RGB")
+# Prepare inputs
+prompt = "What medical condition is shown in this image?"
+inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda")
+# Generate response
+outputs = model.generate(**inputs, max_new_tokens=200)
+# Decode and print the output
+generated_text = processor.batch_decode(outputs, skip_special_tokens=True)[0]
+print(generated_text)
+Dataset Information
+The ROCOv2 (Radiology Objects in Context) dataset is a comprehensive collection of radiology images and their corresponding expert-written captions. The sampled version used for this fine-tuning, unsloth/Radiology_mini, provides a subset suitable for efficient experimentation and training.
+Dataset Features:
+image: The medical image (X-ray, CT scan, ultrasound).
+image_id: Unique identifier for the image.
+caption: Expert-written description of the medical image.
+cui: Concept Unique Identifier (from UMLS Metathesaurus), providing standardized medical terminology.