Initial commit of vit-xray-v1

Files changed (7) hide show

.gitattributes ADDED Viewed

+*.safetensors filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text

LICENSE ADDED Viewed

+MIT License
+Copyright (c) 2025 OM KUMAR
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

+# ViT X-ray Multi-label (vit-xray-v1)
+**Author:** OM KUMAR (Hugging Face: @itsomk)
+**Model type:** Vision Transformer (google/vit-base-patch16-224-in21k fine-tuned)
+**Task:** Multi-label chest X-ray classification (Nodule, Infiltration, Effusion, Atelectasis)
+**License:** MIT
+## Quick usage
+```python
+from transformers import AutoImageProcessor, AutoModelForImageClassification
+import torch
+from PIL import Image
+MODEL = "itsomk/vit-xray-v1"
+processor = AutoImageProcessor.from_pretrained(MODEL)
+model = AutoModelForImageClassification.from_pretrained(MODEL)
+img = Image.open("path/to/xray.jpg").convert("RGB")
+inputs = processor(images=img, return_tensors="pt")
+with torch.no_grad():
+    logits = model(**inputs).logits
+probs = torch.sigmoid(logits).squeeze().tolist()
+labels = [model.config.id2label[str(i)] for i in range(len(probs))]
+print(list(zip(labels, probs)))

config.json ADDED Viewed

+{
+  "_name_or_path": "google/vit-base-patch16-224-in21k",
+  "architectures": [
+    "ViTForImageClassification"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "encoder_stride": 16,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.0,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "Nodule",
+    "1": "Infiltration",
+    "2": "Effusion",
+    "3": "Atelectasis"
+  },
+  "image_size": 224,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "Atelectasis": 3,
+    "Effusion": 2,
+    "Infiltration": 1,
+    "Nodule": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "model_type": "vit",
+  "num_attention_heads": 12,
+  "num_channels": 3,
+  "num_hidden_layers": 12,
+  "patch_size": 16,
+  "problem_type": "multi_label_classification",
+  "qkv_bias": true,
+  "torch_dtype": "float32",
+  "transformers_version": "4.38.2"
+}

model.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:2d91dbfa7fb4fe32a15e1059200db5f415852f1f4cf440d2a061028803375f74
+size 343230128

preprocessor_config.json ADDED Viewed

+{
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "image_processor_type": "ViTImageProcessor",
+  "image_std": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "resample": 2,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 224,
+    "width": 224
+  }
+}

requirements.txt ADDED Viewed

+transformers>=4.38.2
+torch
+Pillow
+safetensors