DeepLabV3 ResNet-50 - Cityscapes (Numpy Format)

This model is a DeepLabV3 architecture with a ResNet-50 backbone trained on the Cityscapes dataset (converted to .npy format) for urban scene semantic segmentation.

  • 🧠 Architecture: DeepLabV3 + ResNet-50
  • πŸ—ΊοΈ Dataset: Cityscapes (train/val split, .npy images and labels)
  • 🏁 Input Resolution: 360 x 720
  • 🎯 Task: Semantic Segmentation
  • πŸ“¦ Framework: PyTorch / Torchvision

✨ Training Summary

  • Epochs: 1
  • Optimizer: Adam
  • Loss: CrossEntropyLoss
  • Final Loss: 0.5157
  • Pixel Accuracy: 0.7805

πŸ—‚οΈ Files

  • pytorch_model.bin: model weights
  • config.json: simple JSON config
  • README.md: this model card

πŸ“₯ Usage

Inference Example

import torch
from torchvision import models, transforms
from PIL import Image
import numpy as np

# Load model
model = models.segmentation.deeplabv3_resnet50(pretrained=False, num_classes=19)
model.load_state_dict(torch.load("pytorch_model.bin"))
model.eval()

# Prepare input image (must be 360x720)
img = Image.open("your_image.jpg").convert("RGB")
transform = transforms.Compose([
    transforms.Resize((360, 720)),
    transforms.ToTensor()
])
input_tensor = transform(img).unsqueeze(0)

# Predict
with torch.no_grad():
    output = model(input_tensor)["out"]
    pred = torch.argmax(output.squeeze(), dim=0).cpu().numpy()

# Now you can decode `pred` using your label color map

πŸ™οΈ Dataset

The Cityscapes dataset was used in .npy format with the following folder structure:

/data/train/image/*.npy   # RGB images as float64
/data/train/label/*.npy   # label masks

Each .npy image was a (128, 256, 3) float64 image scaled to 0-1 and resized to 360x720 for DeepLabV3.

🧾 License

This model is released under the Apache 2.0 License.


Trained and uploaded from Kaggle Notebook using PyTorch + Hugging Face Hub.


Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results