Upload folder using huggingface_hub

Browse files

Files changed (9) hide show

.gitattributes +3 -0
README.md +134 -0
best_model.pt +3 -0
output.png +3 -0
output_augmentation.png +3 -0
output_confusion_matrix.png +0 -0
output_grad_cam.jpg +0 -0
training_history.csv +31 -0
training_plot.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+output.png filter=lfs diff=lfs merge=lfs -text
+output_augmentation.png filter=lfs diff=lfs merge=lfs -text
+training_plot.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,134 @@

+---
+license: mit
+library_name: timm
+tags:
+- image-classification
+- mobilevit
+- timm
+- drowsiness-detection
+- computer-vision
+- pytorch
+widget:
+- modelId: your-username/mobilevit-drowsiness-detection
+  title: Drowsiness Detection with MobileViT v2
+  url: https://huggingface.co/spaces/user-name/repo-name/resolve/main/grid_output.jpg
+datasets:
+- ismailnasri20/driver-drowsiness-dataset-ddd
+- yasharjebraeily/drowsy-detection-dataset
+metrics:
+- accuracy
+- f1
+- precision
+- recall
+---
+# MobileViT v2 for Drowsiness Detection
+This repository contains a `MobileViT v2` classification model fine-tuned to detect driver drowsiness from images. The model is a state-of-the-art, lightweight, hybrid architecture combining convolutions with Vision Transformers, making it efficient and accurate. It classifies input images into two categories: `Drowsy` and `Non Drowsy`.
+This model was trained in PyTorch using the `timm` library and demonstrates high performance on an unseen test set, making it a reliable foundation for driver safety applications.
+## Model Details
+*   **Architecture:** `mobilevitv2_200`
+*   **Fine-tuned on:** A combined dataset for driver drowsiness detection.
+*   **Classes:** `Drowsy`, `Non Drowsy`
+*   **Frameworks:** PyTorch, timm
+## How to Get Started
+You can easily use this model with the `timm` and `torch` libraries. First, ensure you have the `best_model.pt` file from this repository.
+```python
+# Install required libraries
+!pip install timm torch torchvision
+import torch
+import timm
+from PIL import Image
+from torchvision import transforms
+# --- 1. Setup Model and Preprocessing ---
+# Define the same transformations used for validation/testing
+val_test_transform = transforms.Compose([
+    transforms.Resize((224, 224)),
+    transforms.ToTensor(),
+    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
+])
+# Define class names (ensure order matches training: Drowsy=0, Non Drowsy=1)
+class_names = ['Drowsy', 'Non Drowsy']
+# Load the model architecture
+model = timm.create_model('mobilevitv2_200', pretrained=False, num_classes=2)
+# Load the fine-tuned weights
+model_path = 'best_model.pt'
+model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu')))
+model.eval()
+# --- 2. Run Inference ---
+image_path = 'path/to/your/image.jpg'
+image = Image.open(image_path).convert('RGB')
+# Preprocess the image
+input_tensor = val_test_transform(image).unsqueeze(0) # Add batch dimension
+# Get model prediction
+with torch.no_grad():
+    output = model(input_tensor)
+    probabilities = torch.nn.functional.softmax(output[0], dim=0)
+    top_prob, top_class_index = torch.topk(probabilities, 1)
+class_name = class_names[top_class_index.item()]
+confidence = top_prob.item()
+print(f"Prediction: {class_name} with confidence {confidence:.4f}")
+```
+## Training Procedure
+The model was fine-tuned on a large dataset of over 40,000 driver images. The training process involved:
+-   **Data Augmentation:** A strong augmentation pipeline was used for training, including `RandomResizedCrop`, `RandomHorizontalFlip`, `ColorJitter`, and `RandomErasing`.
+-   **Transfer Learning:** The model was initialized with weights pretrained on ImageNet, enabling robust feature extraction and fast convergence.
+-   **Early Stopping:** Training was halted after 30 epochs of no improvement in validation accuracy to prevent overfitting.
+### Key Hyperparameters
+- **Image Size:** 224x224
+- **Batch Size:** 64
+- **Optimizer:** AdamW (lr=1e-4)
+- **Scheduler:** ExponentialLR (gamma=0.90)
+- **Loss Function:** CrossEntropyLoss
+![Training Results](training_plot.png)
+## Evaluation
+The model was evaluated on a completely **unseen test set** (from a different dataset than the primary training data) to ensure a fair assessment of its generalization capabilities.
+### Key Performance Metrics
+| Metric | Value  | Description                                        |
+| :----: | :----: | :------------------------------------------------- |
+| **Accuracy** | 98.18% | Overall correctness on the test set.           |
+| **APCER**    | 3.57%  | Rate of 'Drowsy' drivers missed (False Negatives). |
+| **BPCER**    | 0.00%  | Rate of 'Non Drowsy' drivers flagged (False Positives). |
+| **ACER**     | 1.78%  | Average of APCER and BPCER.                        |
+*APCER (Attack Presentation Classification Error Rate, adapted here) is the most critical safety metric, as it measures the failure to detect a drowsy driver.*
+![Confusion Matrix](output_confusion_matrix.png)
+### Model Explainability (Grad-CAM)
+To ensure the model is focusing on relevant facial features, Grad-CAM was used. The heatmaps confirm that the model's predictions are primarily based on the driver's eyes, mouth, and head position, which are key indicators of drowsiness.
+![Grad-CAM Visualization](output_grad_cam.jpg)
+## Intended Use and Limitations
+This model is intended as a proof-of-concept for driver safety systems and academic research. It should not be used as the sole mechanism for preventing accidents in a production environment without further rigorous testing.
+Real-world performance may vary based on:
+-   Lighting conditions (especially at night).
+-   Camera angles and distance.
+-   Occlusions (e.g., sunglasses, hats, hands on face).
+-   Individual differences not represented in the training data.
+*This model card is based on the training notebook [`MobileViT_Drowsiness.ipynb`](https://github.com/mosesab/MobileViT-Drowsiness-Detection/blob/main/MobileViT_Drowsiness.ipynb).*

best_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fcbe35c8e0c8149bed84189ab3cf0a06429107a968667a9f681ff113bed35867
+size 69935051

output.png ADDED Viewed

Git LFS Details

SHA256: fc683a3462fc88bec755973c36bfd8e1e70864e3e7a43ddb7503a473631871a3
Pointer size: 132 Bytes
Size of remote file: 3.04 MB

output_augmentation.png ADDED Viewed

Git LFS Details

SHA256: 45b22ccd475bbe66563ebd294debc7f1d78b418edd530bd760880475ece5b3dd
Pointer size: 131 Bytes
Size of remote file: 798 kB

output_confusion_matrix.png ADDED Viewed

output_grad_cam.jpg ADDED Viewed

training_history.csv ADDED Viewed

	@@ -0,0 +1,31 @@

+epoch,train_loss,train_acc,val_loss,val_acc
+1,0.005245581285077896,0.9985433537428403,0.0022536597098549594,0.9993005036373812
+2,0.005058018809589851,0.9984445980643888,0.0023724412707676803,0.9993005036373812
+3,0.00333882693223768,0.9990618210547108,0.004640081306103603,0.9981813094571909
+4,0.0019675480249330287,0.9994321548489039,0.0011330571216904648,0.9995803021824288
+5,0.0009190954186649758,0.9996790440450326,0.0042279156476276464,0.9987409065472861
+6,0.003519932303200358,0.9989136875370335,0.0022254574140079496,0.999440402909905
+7,0.0008493974372590355,0.9998024886430971,0.005408237440245763,0.9987409065472861
+8,0.0012583149986798944,0.9995309105273553,0.0014416664605325462,0.9995803021824288
+9,0.0004477896281065585,0.9998765554019357,0.0015333133928007877,0.9995803021824288
+10,0.0010184175027428194,0.9996790440450326,0.0008335669395869618,0.9998601007274763
+11,0.0004673596799982551,0.9998518664823228,0.00048003266577130574,0.9998601007274763
+12,0.0004278480958328559,0.9998765554019357,0.0010320477756580264,0.9997202014549526
+13,0.0006154210043430926,0.9998518664823228,0.001365777820691367,0.999440402909905
+14,0.00031554297358610365,0.9999012443215486,0.0020125484583530568,0.9995803021824288
+15,0.0008148343436515399,0.9998024886430971,0.0009892107681903222,0.9998601007274763
+16,0.00044639887271710017,0.9998518664823228,0.0007288139932215199,0.9995803021824288
+17,0.0001811253026875362,0.9999753110803872,0.0005784849157645884,0.9995803021824288
+18,0.00046878313578802293,0.9999259332411614,0.0007865349200535725,0.9997202014549526
+19,6.448918337161184e-05,1.0,0.0007113339355956221,0.999440402909905
+20,0.00033571305326825105,0.9999259332411614,0.0013030710574786868,0.9995803021824288
+21,4.827969234206115e-05,0.9999753110803872,0.000493603309694494,0.9995803021824288
+22,2.9587593322939357e-05,1.0,0.0005621903976394485,0.9998601007274763
+23,0.0002729453408775668,0.9999259332411614,0.0005450556711127411,0.9997202014549526
+24,5.782559405570643e-05,0.9999753110803872,0.0006117059190368832,0.9997202014549526
+25,9.650301194302824e-05,0.9999753110803872,0.0015031366452237724,0.9995803021824288
+26,0.00018091677156248143,0.9999753110803872,0.000420644104269485,0.9998601007274763
+27,0.00040603304785788484,0.9999259332411614,0.0009131295740309233,0.9995803021824288
+28,1.6794279317459968e-05,1.0,0.0007172291170396112,0.9995803021824288
+29,4.037580577003857e-05,1.0,0.0006496535298990078,0.9995803021824288
+30,3.2526824202515245e-05,0.9999753110803872,0.0006385279186205687,0.9995803021824288

training_plot.png ADDED Viewed

Git LFS Details

SHA256: f013679c412977bf7ed0d474fbb1d00a8fce95c7cbebcc69a141aef3d4a5f13a
Pointer size: 131 Bytes
Size of remote file: 131 kB