# 🚀 Production Deployment Guide for Dressify ## Overview This guide explains how to deploy Dressify as a production-ready outfit recommendation service using the official Polyvore dataset splits. ## 🎯 Key Changes Made ### 1. **Official Split Usage** ✅ - **Before**: System tried to create random 70/15/15 splits - **After**: System uses official splits from `nondisjoint/` and `disjoint/` folders - **Benefit**: Reproducible, research-grade results ### 2. **Robust Dataset Detection** 🔍 - Automatically detects official splits in multiple locations - Falls back to metadata extraction if needed - No more random split creation by default ### 3. **Production-Ready Startup** 🚀 - Comprehensive error handling and diagnostics - Clear status reporting - Automatic dataset verification ## 📁 Dataset Structure The system expects this structure after download: ``` data/Polyvore/ ├── images/ # Extracted from images.zip ├── nondisjoint/ # Official splits (preferred) │ ├── train.json # 31.8 MB - Training outfits │ ├── valid.json # 2.99 MB - Validation outfits │ └── test.json # 5.97 MB - Test outfits ├── disjoint/ # Alternative official splits │ ├── train.json # 9.65 MB - Training outfits │ ├── valid.json # 1.72 MB - Validation outfits │ └── test.json # 8.36 MB - Test outfits ├── polyvore_item_metadata.json # 105 MB - Item metadata ├── polyvore_outfit_titles.json # 6.97 MB - Outfit information └── categories.csv # 4.91 KB - Category mappings ``` ## 🚀 Deployment Steps ### Step 1: Initial Setup ```bash # Clone the repository git clone cd recomendation # Install dependencies pip install -r requirements.txt ``` ### Step 2: Dataset Preparation ```bash # Run the startup fix script python startup_fix.py ``` This script will: 1. ✅ Download the Polyvore dataset from Hugging Face 2. ✅ Extract images from images.zip 3. ✅ Detect official splits in nondisjoint/ and disjoint/ 4. ✅ Create training splits from official data 5. ✅ Verify all components are ready ### Step 3: Verify Dataset ```bash # Check dataset status python -c " from utils.data_fetch import check_dataset_structure import json structure = check_dataset_structure('data/Polyvore') print(json.dumps(structure, indent=2)) " ``` Expected output: ```json { "status": "ready", "images": { "exists": true, "count": 100000, "extensions": [".jpg", ".jpeg", ".png"] }, "splits": { "nondisjoint": { "train.json": {"exists": true, "size_mb": 31.8}, "valid.json": {"exists": true, "size_mb": 2.99}, "test.json": {"exists": true, "size_mb": 5.97} } } } ``` ### Step 4: Launch Application ```bash # Start the main application python app.py ``` The system will: 1. 🔍 Check dataset status 2. ✅ Load official splits 3. 🚀 Launch Gradio interface 4. 🎯 Be ready for training and inference ## 🔧 Troubleshooting ### Issue: "No official splits found" **Cause**: The dataset download didn't include the split files. **Solution**: ```bash # Check what was downloaded ls -la data/Polyvore/ # Re-run data fetcher python -c " from utils.data_fetch import ensure_dataset_ready ensure_dataset_ready() " ``` ### Issue: "Dataset preparation failed" **Cause**: The prepare script couldn't parse the official splits. **Solution**: ```bash # Check split file format head -20 data/Polyvore/nondisjoint/train.json # Run preparation manually python scripts/prepare_polyvore.py --root data/Polyvore ``` ### Issue: "Out of memory during training" **Cause**: GPU memory insufficient for default batch sizes. **Solution**: Use the Advanced Training interface to reduce batch sizes: - ResNet: Reduce from 64 to 16-32 - ViT: Reduce from 32 to 8-16 - Enable mixed precision (AMP) ## 🎯 Production Configuration ### Environment Variables ```bash export EXPORT_DIR="models/exports" export POLYVORE_ROOT="data/Polyvore" export CUDA_VISIBLE_DEVICES="0" # Specify GPU ``` ### Docker Deployment ```bash # Build image docker build -t dressify . # Run container docker run -p 7860:7860 -p 8000:8000 \ -v $(pwd)/data:/app/data \ -v $(pwd)/models:/app/models \ dressify ``` ### Hugging Face Space 1. Upload the entire `recomendation/` folder 2. Set Space type to "Gradio" 3. The system auto-bootstraps on first run 4. Uses official splits for production-quality results ## 📊 Expected Performance ### Dataset Statistics - **Total Images**: ~100,000 fashion items - **Training Outfits**: ~50,000 (nondisjoint) or ~20,000 (disjoint) - **Validation Outfits**: ~5,000 (nondisjoint) or ~2,000 (disjoint) - **Test Outfits**: ~10,000 (nondisjoint) or ~4,000 (disjoint) ### Training Times (L4 GPU) - **ResNet Item Embedder**: 2-4 hours (20 epochs) - **ViT Outfit Encoder**: 1-2 hours (30 epochs) - **Total**: 3-6 hours for full training ### Inference Performance - **Item Embedding**: < 50ms per image - **Outfit Generation**: < 100ms per outfit - **Memory Usage**: ~2-4 GB GPU VRAM ## 🔬 Research vs Production ### Research Mode ```bash # Use disjoint splits (smaller, more challenging) python scripts/prepare_polyvore.py --root data/Polyvore # Automatically uses disjoint/ splits ``` ### Production Mode ```bash # Use nondisjoint splits (larger, more robust) python scripts/prepare_polyvore.py --root data/Polyvore # Automatically uses nondisjoint/ splits (default) ``` ## 📝 Monitoring & Logging ### Training Logs ```bash # Check training progress tail -f models/exports/training.log # Monitor GPU usage nvidia-smi -l 1 ``` ### System Health ```bash # Health check endpoint curl http://localhost:8000/health # Expected response { "status": "ok", "device": "cuda:0", "resnet": "resnet50_v2", "vit": "vit_outfit_v1" } ``` ## 🚨 Emergency Procedures ### Dataset Corruption ```bash # Remove corrupted data rm -rf data/Polyvore/splits/ # Re-run preparation python startup_fix.py ``` ### Model Issues ```bash # Remove corrupted models rm -rf models/exports/*.pth # Re-train from scratch python train_resnet.py --data_root data/Polyvore --epochs 20 python train_vit_triplet.py --data_root data/Polyvore --epochs 30 ``` ### System Recovery ```bash # Full system reset rm -rf data/Polyvore/ rm -rf models/exports/ # Fresh start python startup_fix.py ``` ## ✅ Production Checklist - [ ] Dataset downloaded successfully (2.5GB+ images) - [ ] Official splits detected in nondisjoint/ or disjoint/ - [ ] Training splits created in data/Polyvore/splits/ - [ ] Models can be trained without errors - [ ] Inference service responds to health checks - [ ] Gradio interface loads successfully - [ ] Advanced training controls work - [ ] Model checkpoints can be saved/loaded ## 🎉 Success Indicators When everything is working correctly, you should see: ``` ✅ Dataset ready at: data/Polyvore 📊 Images: 100000 files 📋 polyvore_item_metadata.json: 105.0 MB 📋 polyvore_outfit_titles.json: 6.97 MB 🎯 Official splits found: ✅ nondisjoint/train.json (31.8 MB) ✅ nondisjoint/valid.json (2.99 MB) ✅ nondisjoint/test.json (5.97 MB) 🎉 Using official splits from dataset! ✅ Dataset preparation completed successfully! ✅ All required splits verified! 🚀 Your Dressify system is ready to use! ``` ## 📞 Support If you encounter issues: 1. **Check the logs** for specific error messages 2. **Verify dataset structure** matches expected layout 3. **Run startup_fix.py** for automated diagnostics 4. **Check GPU memory** and reduce batch sizes if needed 5. **Ensure official splits** are present in nondisjoint/ or disjoint/ --- **🎯 Your Dressify system is now production-ready with official dataset splits!**