chhatramani commited on
Commit
80614ae
·
verified ·
1 Parent(s): d7e25f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -5
README.md CHANGED
@@ -1,5 +1,105 @@
1
- ---
2
- license: cc-by-sa-4.0
3
- tags:
4
- - unsloth
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-sa-4.0
3
+ tags:
4
+ - unsloth
5
+ datasets:
6
+ - unsloth/Radiology_mini
7
+ language:
8
+ - en
9
+ base_model:
10
+ - unsloth/gemma-3n-E2B-it
11
+ pipeline_tag: visual-question-answering
12
+ ---
13
+
14
+ # chhatramani/Gemma3n_Radiology_v1
15
+
16
+ ## Fine-Tuned Gemma 3N for Medical VQA on ROCOv2
17
+
18
+ This repository hosts `chhatramani/Gemma3n_Radiology_v1`, a vision-language model (VLM) fine-tuned on the ROCOv2 radiography dataset for Medical Visual Question Answering (VQA). This model leverages the latest **Gemma 3N** architecture from Google's Gemmaverse, with both its vision and language components fine-tuned for improved performance in the medical domain.
19
+
20
+ **Please Note:** This model was developed as an experimental project for research and educational purposes only. It is not intended for clinical use or to provide medical advice. Always consult with qualified medical professionals for diagnosis and treatment.
21
+
22
+ ## Model Description
23
+
24
+ `chhatramani/Gemma3n_Radiology_v1` is built upon the powerful `unsloth/gemma-3n-E2B-it` base model. It has undergone parameter-efficient fine-tuning (PEFT) using LoRA adapters, specifically targeting both the vision and language layers, including attention and MLP modules. This approach allows for efficient training by updating only a small percentage of the model's parameters while achieving significant performance gains.
25
+
26
+ The fine-tuning process aimed to transform a general-purpose VLM into a specialized tool for medical professionals, capable of analyzing medical images (X-rays, CT scans, ultrasounds) and understanding expert-written captions describing medical conditions and diseases.
27
+
28
+ ## Training Details
29
+
30
+ The model was fine-tuned using the following key technologies and methodologies:
31
+
32
+ * **Base Model:** `unsloth/gemma-3n-E2B-it`
33
+ * **Fine-tuning Framework:** `UnSloth`, `HuggingFace`, `TRL`
34
+ * **PEFT Method:** LoRA (Low-Rank Adaptation)
35
+ * `finetune_vision_layers`: `True`
36
+ * `finetune_language_layers`: `True`
37
+ * `finetune_attention_modules`: `True`
38
+ * `finetune_mlp_modules`: `True`
39
+ * `r`: 16
40
+ * `lora_alpha`: 16
41
+ * `lora_dropout`: 0
42
+ * `bias`: "none"
43
+ * `random_state`: 3407
44
+ * `use_rslora`: `False`
45
+ * `loftq_config`: `None`
46
+ * `target_modules`: "all-linear"
47
+ * `modules_to_save`: `["lm_head", "embed_tokens"]`
48
+ * **Dataset:** A sampled version of the [ROCO radiography dataset](https://huggingface.co/datasets/unsloth/Radiology_mini). The full dataset is available [here](https://huggingface.co/datasets/eltorio/ROCOv2-radiology). The dataset consists of medical images (X-rays, CT scans, ultrasounds) paired with expert-written captions.
49
+
50
+ ### Installation (for local reproduction/usage)
51
+
52
+ To use or reproduce the training of this model, you will need to install the necessary libraries. It is recommended to use `UnSloth` for optimized performance.
53
+
54
+ ```bash
55
+ # For Colab notebooks (or similar environments)
56
+ !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo
57
+ !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" huggingface_hub hf_transfer
58
+ !pip install --no-deps unsloth
59
+
60
+ # Install latest transformers and timm for Gemma 3N compatibility
61
+ !pip install --no-deps transformers==4.53.1
62
+ !pip install --no-deps --upgrade timm
63
+
64
+ Usage (Inference)
65
+ To use this model for inference, you can load it directly from Hugging Face Transformers.
66
+
67
+ from unsloth import FastVisionModel
68
+ from transformers import AutoProcessor
69
+ import torch
70
+ from PIL import Image
71
+
72
+ # Load the model and processor
73
+ model, processor = FastVisionModel.from_pretrained(
74
+ "chhatramani/Gemma3n_Radiology_v1",
75
+ load_in_4bit = True, # Use 4bit for inference to reduce memory use
76
+ )
77
+
78
+ # Example Usage:
79
+ # You can replace this with your own medical image and question
80
+ image_path = "path/to/your/medical_image.jpg" # Replace with an actual image path
81
+ image = Image.open(image_path).convert("RGB")
82
+
83
+ # Prepare inputs
84
+ prompt = "What medical condition is shown in this image?"
85
+ inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda")
86
+
87
+ # Generate response
88
+ outputs = model.generate(**inputs, max_new_tokens=200)
89
+
90
+ # Decode and print the output
91
+ generated_text = processor.batch_decode(outputs, skip_special_tokens=True)[0]
92
+ print(generated_text)
93
+
94
+ Dataset Information
95
+ The ROCOv2 (Radiology Objects in Context) dataset is a comprehensive collection of radiology images and their corresponding expert-written captions. The sampled version used for this fine-tuning, unsloth/Radiology_mini, provides a subset suitable for efficient experimentation and training.
96
+
97
+ Dataset Features:
98
+
99
+ image: The medical image (X-ray, CT scan, ultrasound).
100
+
101
+ image_id: Unique identifier for the image.
102
+
103
+ caption: Expert-written description of the medical image.
104
+
105
+ cui: Concept Unique Identifier (from UMLS Metathesaurus), providing standardized medical terminology.