DeepLabV3 ResNet-50 - Cityscapes (Numpy Format)
This model is a DeepLabV3 architecture with a ResNet-50 backbone trained on the Cityscapes dataset (converted to .npy
format) for urban scene semantic segmentation.
- π§ Architecture: DeepLabV3 + ResNet-50
- πΊοΈ Dataset: Cityscapes (train/val split,
.npy
images and labels) - π Input Resolution:
360 x 720
- π― Task: Semantic Segmentation
- π¦ Framework: PyTorch / Torchvision
β¨ Training Summary
- Epochs:
1
- Optimizer:
Adam
- Loss:
CrossEntropyLoss
- Final Loss:
0.5157
- Pixel Accuracy:
0.7805
ποΈ Files
pytorch_model.bin
: model weightsconfig.json
: simple JSON configREADME.md
: this model card
π₯ Usage
Inference Example
import torch
from torchvision import models, transforms
from PIL import Image
import numpy as np
# Load model
model = models.segmentation.deeplabv3_resnet50(pretrained=False, num_classes=19)
model.load_state_dict(torch.load("pytorch_model.bin"))
model.eval()
# Prepare input image (must be 360x720)
img = Image.open("your_image.jpg").convert("RGB")
transform = transforms.Compose([
transforms.Resize((360, 720)),
transforms.ToTensor()
])
input_tensor = transform(img).unsqueeze(0)
# Predict
with torch.no_grad():
output = model(input_tensor)["out"]
pred = torch.argmax(output.squeeze(), dim=0).cpu().numpy()
# Now you can decode `pred` using your label color map
ποΈ Dataset
The Cityscapes dataset was used in .npy
format with the following folder structure:
/data/train/image/*.npy # RGB images as float64
/data/train/label/*.npy # label masks
Each .npy
image was a (128, 256, 3)
float64 image scaled to 0-1 and resized to 360x720
for DeepLabV3.
π§Ύ License
This model is released under the Apache 2.0 License.
Trained and uploaded from Kaggle Notebook using PyTorch + Hugging Face Hub.
- Downloads last month
- 9
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Evaluation results
- pixel_accuracy on Cityscapes (Numpy)self-reported0.780