RadFig-classifier
A deep learning model for classifying medical images as suitable or unsuitable for Visual Question Answering (VQA) tasks. This classifier helps filter medical images to identify those that are appropriate for VQA applications.
Overview
RadFig-classifier is based on EfficientNetV2-S architecture and trained on medical imaging data to determine whether an image contains sufficient visual information for meaningful question-answering tasks. The model uses ensemble prediction across 5-fold cross-validation models for robust performance.
Installation
Requirements
pip install torch torchvision timm opencv-python albumentations pandas tqdm pillow numpy
Command Line Usage
Single Image Classification
# Get probability score
python inference.py --input image.jpg
# Get binary classification
python inference.py --input image.jpg --binary
Batch Processing
# Process all images in directory
python inference.py --input /path/to/images/ --output results.csv
# Binary classification with CSV output
python inference.py --input /path/to/images/ --output results.csv --binary
Model Architecture
- Base Model: EfficientNetV2-S
- Input Size: 512Γ512 pixels
- Output: Single probability score (0-1)
- Training: 5-fold cross-validation ensemble
- Framework: PyTorch + timm
Directory Structure
RadFig-classifier/
βββ inference.py # Main inference script
βββ models/ # Pre-trained model weights
β βββ tf_efficientnetv2_s.in21k_ft_in1k_fold0_best_loss.pth
β βββ tf_efficientnetv2_s.in21k_ft_in1k_fold1_best_loss.pth
β βββ tf_efficientnetv2_s.in21k_ft_in1k_fold2_best_loss.pth
β βββ tf_efficientnetv2_s.in21k_ft_in1k_fold3_best_loss.pth
β βββ tf_efficientnetv2_s.in21k_ft_in1k_fold4_best_loss.pth
βββ README.md
βββ requirements.txt
Output Format
Single Image Output
Image: medical_scan.jpg
Probability suitable for VQA: 0.8542
Classification: suitable
CSV Output
image_path | filename | prediction | suitable_for_vqa |
---|---|---|---|
/path/img1.jpg | img1.jpg | 0.8542 | True |
/path/img2.jpg | img2.jpg | 0.2156 | False |
/path/img3.jpg | img3.jpg | 0.9234 | True |
Command Line Arguments
Argument | Description | Required |
---|---|---|
--input |
Input image file or directory | Yes |
--models |
Directory containing model files | No (default: "models") |
--output |
Output CSV file path | No |
--binary |
Return binary predictions instead of probabilities | No |
Use Cases
- Medical VQA Systems: Pre-filter images before VQA processing
- Dataset Curation: Automatically filter medical image datasets
- Quality Control: Assess image quality for medical AI applications
- Research: Filter images for medical computer vision studies
Citation
If you use RadFig-classifier in your research, please cite:
coming soon...
License
This project is licensed under the MIT License - see the LICENSE file for details.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support