ViT X-ray Multi-label (vit-xray-v1)

Model Description

This model is a fine-tuned Vision Transformer (google/vit-base-patch16-224-in21k) for multi-label classification of chest X-rays.
It predicts the presence of multiple findings such as:

  • Nodule
  • Infiltration
  • Effusion
  • Atelectasis

Author: Om Kumar (Hugging Face: @itsomk)

The model is designed for research and educational purposes only and should not be used as a substitute for clinical diagnosis.


Intended Use

  • Research in medical imaging and computer vision
  • Educational purposes for understanding X-ray image classification
  • Baseline model for further fine-tuning or domain adaptation

โš ๏ธ Not intended for clinical use. Predictions should not guide medical decisions.


Training Data

  • Dataset: Chest X-ray images (publicly available datasets, e.g., NIH ChestX-ray14, etc.)
  • Images were preprocessed (resized to 224x224, normalized).
  • Labels are multi-label, meaning an X-ray can contain more than one finding.

Model Performance

  • Optimized for detecting common thoracic abnormalities.
  • Evaluation metrics: AUC .
  • Nodule AUC: 0.696
  • Infiltration AUC: 0.684
  • Effusion AUC: 0.843
  • Atelectasis AUC: 0.762

Quick Usage

from transformers import AutoImageProcessor, AutoModelForImageClassification
import torch
from PIL import Image

MODEL = "itsomk/vit-xray-v1"
processor = AutoImageProcessor.from_pretrained(MODEL)
model = AutoModelForImageClassification.from_pretrained(MODEL)

img = Image.open("path/to/xray.jpg").convert("RGB")
inputs = processor(images=img, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

probs = torch.sigmoid(logits).squeeze().tolist()
results = {model.config.id2label[i]: float(probs[i]) for i in range(len(probs))}
print(results)


Downloads last month
10
Safetensors
Model size
85.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support