RadFig-classifier

A deep learning model for classifying medical images as suitable or unsuitable for Visual Question Answering (VQA) tasks. This classifier helps filter medical images to identify those that are appropriate for VQA applications.

Overview

RadFig-classifier is based on EfficientNetV2-S architecture and trained on medical imaging data to determine whether an image contains sufficient visual information for meaningful question-answering tasks. The model uses ensemble prediction across 5-fold cross-validation models for robust performance.

Installation

Requirements

pip install torch torchvision timm opencv-python albumentations pandas tqdm pillow numpy

Command Line Usage

Single Image Classification

# Get probability score
python inference.py --input image.jpg

# Get binary classification
python inference.py --input image.jpg --binary

Batch Processing

# Process all images in directory
python inference.py --input /path/to/images/ --output results.csv

# Binary classification with CSV output
python inference.py --input /path/to/images/ --output results.csv --binary

Model Architecture

Base Model: EfficientNetV2-S
Input Size: 512×512 pixels
Output: Single probability score (0-1)
Training: 5-fold cross-validation ensemble
Framework: PyTorch + timm

Directory Structure

RadFig-classifier/
├── inference.py           # Main inference script
├── models/                # Pre-trained model weights
│   ├── tf_efficientnetv2_s.in21k_ft_in1k_fold0_best_loss.pth
│   ├── tf_efficientnetv2_s.in21k_ft_in1k_fold1_best_loss.pth
│   ├── tf_efficientnetv2_s.in21k_ft_in1k_fold2_best_loss.pth
│   ├── tf_efficientnetv2_s.in21k_ft_in1k_fold3_best_loss.pth
│   └── tf_efficientnetv2_s.in21k_ft_in1k_fold4_best_loss.pth
├── README.md
└── requirements.txt

Output Format

Single Image Output

Image: medical_scan.jpg
Probability suitable for VQA: 0.8542
Classification: suitable

CSV Output

image_path	filename	prediction	suitable_for_vqa
/path/img1.jpg	img1.jpg	0.8542	True
/path/img2.jpg	img2.jpg	0.2156	False
/path/img3.jpg	img3.jpg	0.9234	True

Command Line Arguments

Argument	Description	Required
`--input`	Input image file or directory	Yes
`--models`	Directory containing model files	No (default: "models")
`--output`	Output CSV file path	No
`--binary`	Return binary predictions instead of probabilities	No

Use Cases

Medical VQA Systems: Pre-filter images before VQA processing
Dataset Curation: Automatically filter medical image datasets
Quality Control: Assess image quality for medical AI applications
Research: Filter images for medical computer vision studies

Citation

If you use RadFig-classifier in your research, please cite:

coming soon...

License

This project is licensed under the MIT License - see the LICENSE file for details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support