DeepLabV3 ResNet-50 - Cityscapes (Numpy Format)

This model is a DeepLabV3 architecture with a ResNet-50 backbone trained on the Cityscapes dataset (converted to .npy format) for urban scene semantic segmentation.

🧠 Architecture: DeepLabV3 + ResNet-50
🗺️ Dataset: Cityscapes (train/val split, .npy images and labels)
🏁 Input Resolution: 360 x 720
🎯 Task: Semantic Segmentation
📦 Framework: PyTorch / Torchvision

✨ Training Summary

Epochs: 1
Optimizer: Adam
Loss: CrossEntropyLoss
Final Loss: 0.5157
Pixel Accuracy: 0.7805

🗂️ Files

pytorch_model.bin: model weights
config.json: simple JSON config
README.md: this model card

📥 Usage

Inference Example

import torch
from torchvision import models, transforms
from PIL import Image
import numpy as np

# Load model
model = models.segmentation.deeplabv3_resnet50(pretrained=False, num_classes=19)
model.load_state_dict(torch.load("pytorch_model.bin"))
model.eval()

# Prepare input image (must be 360x720)
img = Image.open("your_image.jpg").convert("RGB")
transform = transforms.Compose([
    transforms.Resize((360, 720)),
    transforms.ToTensor()
])
input_tensor = transform(img).unsqueeze(0)

# Predict
with torch.no_grad():
    output = model(input_tensor)["out"]
    pred = torch.argmax(output.squeeze(), dim=0).cpu().numpy()

# Now you can decode `pred` using your label color map

🏙️ Dataset

The Cityscapes dataset was used in .npy format with the following folder structure:

/data/train/image/*.npy   # RGB images as float64
/data/train/label/*.npy   # label masks

Each .npy image was a (128, 256, 3) float64 image scaled to 0-1 and resized to 360x720 for DeepLabV3.

🧾 License

This model is released under the Apache 2.0 License.

Trained and uploaded from Kaggle Notebook using PyTorch + Hugging Face Hub.

Koushim
/

deeplabv3-resnet50-cityscapes