Model Card for keerthikoganti/architecture-design-stages-compact-cnn
ArchiTutor is a compact convolutional neural network (CNN) that classifies images
Model Details
Model Description
ArchiTutor is a compact convolutional neural network (CNN) that classifies images of architecture projects into discrete design stages commonly seen in studio workflows: Brainstorm, Design Iteration, Optimization/Detailing, and Final Review/Presentation (class names configurable). The goal is to support design pedagogy and analytics by tagging studio artifacts over time.
Task: Image classification (multi-class)
Inputs: RGB images of architecture artifacts (sketches, diagrams, renders, boards, screenshots)
Outputs: One of the design-stage labels, with class probabilities
Intended audience: Architecture students, instructors, design researchers, education tech tools
- Developed by: Keerthi Koganti (Carnegie Mellon University)
- Model type: Compact Convolutional Neural Network (CNN)
- Language(s) (NLP): English
- License: MIT
Uses
Direct Use
Auto-tagging student submissions by stage for feedback dashboards
Curating datasets of process images for research on studio workflows
Searching/filtering large archives by stage
Out-of-Scope Use
Not a critique engine; it does not assess design quality
May struggle with ambiguous mixed-stage boards or atypical media (e.g., code screenshots)
Performance depends on domain similarity (studio imagery vs. unrelated graphics)
Bias, Risks, and Limitations
Data imbalance: The dataset may contain more examples of final presentation boards than early sketches or optimization models, biasing predictions toward later stages.
Style bias: If most training images come from specific software (e.g., Rhino/Grasshopper or Revit renderings), the model may underperform on hand drawings, mixed-media collages, or atypical workflows.
Recommendations
Diversify training data: Expand datasets to include hand sketches, BIM screenshots, and diverse cultural/academic styles to reduce bias.
Apply fairness checks: Periodically assess per-class and per-style accuracy metrics to ensure no overfitting to dominant visual tropes.
Document provenance: Keep metadata on dataset sources, creators, and usage consent for transparency.
Avoid high-stakes use: The model should not be used for academic assessment, admissions, or publication decisions.
How to Get Started with the Model
Use the code below to get started with the model.
import torch from torchvision import transforms from PIL import Image from model import load_model # your helper from labels import IDX2LABEL # list or dict mapping
device = "cuda" if torch.cuda.is_available() else "cpu" model = load_model(checkpoint_path="checkpoints/best.pt").to(device).eval()
tfm = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225]), ])
img = Image.open("example.jpg").convert("RGB") with torch.no_grad(): logits = model(tfm(img).unsqueeze(0).to(device)) probs = torch.softmax(logits, dim=1).squeeze().cpu().tolist()
pred_idx = int(torch.argmax(logits, dim=1).item()) print(IDX2LABEL[pred_idx], probs[pred_idx])
Training Details
Training Data
Training Procedure
Training Hyperparameters
- Training regime: Framework: PyTorch
Backbone: Compact CNN (e.g., MobileNetV3-Small or custom ~1β3M params)
Head: Global pooling β Dropout β Linear (num_classes)
Loss: Cross-entropy
Optimizer: AdamW (lr=3e-4, wd=1e-4)
Scheduler: Cosine decay with warmup (e.g., 5 epochs)
Augmentations: RandomResizedCrop(224), RandomHorizontalFlip, small ColorJitter
Batch size / Epochs: [e.g., 64 / 30] (early stopping on val loss)
Mixed precision: Recommended (AMP)
Hardware: [e.g., 1Γ A100 / 1Γ RTX 3060]
Reproducibility: Set seeds, log versions (torch, cuda), save train/val metrics
Citation
Gen AI used to made this - ChatGPT and Google Colab
Model Card Contact
Maintainer: Keerthi Koganti
- Downloads last month
- 18