๐Ÿš— TwinCar: Fine-Grained Car Classification with Visual Explainability

TwinCar is an advanced deep learning pipeline designed for precise car make/model recognition using the Stanford Cars 196 dataset.
Features: Transfer learning, robust augmentation, Grad-CAM++ explainability, metric-rich evaluation, and easy deployment.
Developed at Brainster Data Science Academy, 2025.


๐Ÿ“š Table of Contents


Project Overview

TwinCar tackles the problem of fine-grained visual classificationโ€”distinguishing between 196 nearly-identical car models in real-world photos.
This requires:

  • High-capacity neural networks for subtle visual cues
  • Strategies to mitigate class imbalance and overfitting
  • Interpretable predictions for trust and debugging

Our pipeline combines the best of modern deep learning practices with transparency and reproducibility.


Theoretical Foundation

Fine-Grained Recognition

Unlike broad classification, fine-grained tasks (like make/model/year) challenge models to:

  • Discriminate subtle features (e.g., headlight shapes, grill details)
  • Ignore irrelevant background/context
  • Handle many classes with potential class imbalance

Transfer learning is crucialโ€”starting from a pretrained ResNet50 leverages generic visual features, while the custom head specializes in fine detail.

Model Explainability

Explainability is not optional in modern AI:

  • Grad-CAM++ highlights image regions that drive predictions.
  • This builds trust, helps spot spurious correlations, and aids model debugging.

Our Grad-CAM++ overlays reveal _what the network โ€œseesโ€ as important_โ€”vital for deployment in domains like traffic, insurance, or autonomous vehicles.


Dataset & Preprocessing

  • Source: Stanford Cars 196
  • Details: 16,185 labeled images, 196 classes (make/model/year)
  • Train/Val Split: Stratified, 10% for validation

Preprocessing pipeline:

  • Data integrity: remove missing/corrupt files
  • Augmentations:
    • Random resized crop
    • Horizontal flip, rotation
    • Color jitter, blur
  • Normalization: ImageNet statistics (mean/std)
  • Class balancing: Weighted sampling during training

Model Architecture

ResNet50 Backbone (with Custom Classifier Head)
  • Frozen layers: Early layers (generic feature extraction)
  • Trainable layers: Last two ResNet blocks + custom head
  • Classifier Head:
    • Linear โ†’ ReLU โ†’ Dropout โ†’ Linear (196 logits)
  • Optimization:
    • Adam with layer-wise learning rates
    • Cross-Entropy Loss + Label Smoothing (improves calibration)
    • Early Stopping on macro F1

Schematic: Input Image โ†“ [Augmentation] โ†“ ResNet50 (layers 1โ€“2 frozen) โ†“ [Trainable layers 3โ€“4] โ†“ Custom Classifier Head (2-layer MLP) โ†“ Softmax (196-way)


Training & Evaluation Pipeline

  • Epochs: Up to 25 (early stopping by macro F1)
  • Batch Size: 32, with class weighting
  • Validation: Macro/micro accuracy, F1, precision, recall, confusion matrix, Top-3/5 accuracy
  • Logging: All metrics + curve plots (CSV and PNG)
  • Artifacts:
    • Model weights (twin_car_best_model_v2.pth)
    • Class mapping (class_mapping.json)
    • Evaluation plots (curves, confusion matrix, Grad-CAM, etc.)

Model Interpretation & Explainability

Grad-CAM++: See What the Model Sees

Theory:
Grad-CAM++ generates class-discriminative heatmaps highlighting the image regions most important for a modelโ€™s decision.
This allows us to:

  • Verify if the model focuses on the car, not the background
  • Understand failure cases (misclassifications)
  • Communicate model trust to stakeholders

Example:

The above Grad-CAM++ overlay shows the model focusing on distinctive car regions (e.g., front grill, headlights) when classifying.


Key Visualizations

Confusion Matrix

Confusion Matrix
Visualizes prediction accuracy for each class and reveals common confusion points.


Training & Validation Loss/Accuracy

Training & Validation Loss/Accuracy
Tracks the modelโ€™s learning progress and helps detect overfitting or underfitting.


Precision/Recall by Epoch

Precision/Recall by Epoch
Shows how precision and recall (macro and weighted) evolve during training.


Top-3/5 Accuracy

Top-3/5 Accuracy
Indicates how often the true label is among the top-3 or top-5 predictionsโ€”useful for real-world ranking.


Another Grad-CAM++ Example

Demonstrates model interpretabilityโ€”heatmaps show focus on meaningful car regions.


Results & Metrics

Metric Value
Train Loss 0.98
Train Accuracy 99.7%
Val Loss 1.72
Val Accuracy 79.1%
Val Precision (macro) 82.4%
Val Recall (macro) 79.1%
Val F1 (macro) 78.5%
Cohenโ€™s Kappa 0.79
MCC 0.79
Top-3 Accuracy 90.9%
Top-5 Accuracy 93.4%

Interpretation:

  • The model achieves state-of-the-art performance for a highly challenging fine-grained task.
  • High Top-3/5 accuracy demonstrates robust ranking, even when the top prediction isnโ€™t always correct.
  • Grad-CAM++ shows reliable model focus on class-discriminative features.

๐Ÿš€ Try the Live Demo

Gradio App (Demo) Hugging Face Model Card


๐Ÿ“š Resources & References


Quickstart: Inference & Demo

1. Install Dependencies

pip install -r requirements.txt
pip install pytorch-grad-cam gradio

2. Run Inference
python
Copy
Edit
import torch
from torchvision import models, transforms
from PIL import Image
import json

# Load model
model = models.resnet50(weights=None)
model.fc = torch.nn.Sequential(
    torch.nn.Linear(model.fc.in_features, 512),
    torch.nn.ReLU(),
    torch.nn.Dropout(0.2),
    torch.nn.Linear(512, 196)
)
model.load_state_dict(torch.load("twin_car_best_model_v2.pth", map_location="cpu"))
model.eval()

# Preprocess
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
img = Image.open("your_image.jpg")
input_tensor = transform(img).unsqueeze(0)

# Predict
with torch.no_grad():
    output = model(input_tensor)
    pred = output.argmax(1).item()

# Class name
with open("class_mapping.json") as f:
    class_map = json.load(f)
print("Predicted class:", class_map[str(pred)])
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kikogazda/TwinCar-196-v2

Finetuned
(2)
this model

Dataset used to train kikogazda/TwinCar-196-v2