# ๐ŸŽฏ Dressify Training Parameters Guide ## Overview The Dressify system provides **comprehensive parameter control** for both ResNet item embedder and ViT outfit encoder training. This guide covers all the "knobs" you can tweak to experiment with different training configurations. ## ๐Ÿ–ผ๏ธ ResNet Item Embedder Parameters ### Model Architecture | Parameter | Range | Default | Description | |-----------|-------|---------|-------------| | **Backbone Architecture** | `resnet50`, `resnet101` | `resnet50` | Base CNN architecture for feature extraction | | **Embedding Dimension** | 128-1024 | 512 | Output embedding vector size (must match ViT input) | | **Use ImageNet Pretrained** | `true`/`false` | `true` | Initialize with ImageNet weights | | **Dropout Rate** | 0.0-0.5 | 0.1 | Dropout in projection head for regularization | ### Training Parameters | Parameter | Range | Default | Description | |-----------|-------|---------|-------------| | **Epochs** | 1-100 | 20 | Total training iterations | | **Batch Size** | 8-128 | 64 | Images per training batch | | **Learning Rate** | 1e-5 to 1e-2 | 1e-3 | Step size for gradient descent | | **Optimizer** | `adamw`, `adam`, `sgd`, `rmsprop` | `adamw` | Optimization algorithm | | **Weight Decay** | 1e-6 to 1e-2 | 1e-4 | L2 regularization strength | | **Triplet Margin** | 0.1-1.0 | 0.2 | Distance margin for triplet loss | ## ๐Ÿง  ViT Outfit Encoder Parameters ### Model Architecture | Parameter | Range | Default | Description | |-----------|-------|---------|-------------| | **Embedding Dimension** | 128-1024 | 512 | Input embedding size (must match ResNet output) | | **Transformer Layers** | 2-12 | 6 | Number of transformer encoder layers | | **Attention Heads** | 4-16 | 8 | Number of multi-head attention heads | | **Feed-Forward Multiplier** | 2-8 | 4 | Hidden layer size multiplier | | **Dropout Rate** | 0.0-0.5 | 0.1 | Dropout in transformer layers | ### Training Parameters | Parameter | Range | Default | Description | |-----------|-------|---------|-------------| | **Epochs** | 1-100 | 30 | Total training iterations | | **Batch Size** | 4-64 | 32 | Outfits per training batch | | **Learning Rate** | 1e-5 to 1e-2 | 5e-4 | Step size for gradient descent | | **Optimizer** | `adamw`, `adam`, `sgd`, `rmsprop` | `adamw` | Optimization algorithm | | **Weight Decay** | 1e-4 to 1e-1 | 5e-2 | L2 regularization strength | | **Triplet Margin** | 0.1-1.0 | 0.3 | Distance margin for triplet loss | ## โš™๏ธ Advanced Training Settings ### Hardware Optimization | Parameter | Range | Default | Description | |-----------|-------|---------|-------------| | **Mixed Precision (AMP)** | `true`/`false` | `true` | Use automatic mixed precision for faster training | | **Channels Last Memory** | `true`/`false` | `true` | Use channels_last format for CUDA optimization | | **Gradient Clipping** | 0.1-5.0 | 1.0 | Clip gradients to prevent explosion | ### Learning Rate Scheduling | Parameter | Range | Default | Description | |-----------|-------|---------|-------------| | **Warmup Epochs** | 0-10 | 3 | Gradual learning rate increase at start | | **Learning Rate Scheduler** | `cosine`, `step`, `plateau`, `linear` | `cosine` | LR decay strategy | | **Early Stopping Patience** | 5-20 | 10 | Stop training if no improvement | ### Training Strategy | Parameter | Range | Default | Description | |-----------|-------|---------|-------------| | **Triplet Mining Strategy** | `semi_hard`, `hardest`, `random` | `semi_hard` | Negative sample selection method | | **Data Augmentation Level** | `minimal`, `standard`, `aggressive` | `standard` | Image augmentation intensity | | **Random Seed** | 0-9999 | 42 | Reproducible training results | ## ๐Ÿ”ฌ Parameter Impact Analysis ### High Impact Parameters (Experiment First) #### 1. **Learning Rate** ๐ŸŽฏ - **Too High**: Training instability, loss spikes - **Too Low**: Slow convergence, stuck in local minima - **Sweet Spot**: 1e-3 for ResNet, 5e-4 for ViT - **Try**: 1e-4, 1e-3, 5e-3, 1e-2 #### 2. **Batch Size** ๐Ÿ“ฆ - **Small**: Better generalization, slower training - **Large**: Faster training, potential overfitting - **Memory Constraint**: GPU VRAM limits maximum size - **Try**: 16, 32, 64, 128 #### 3. **Triplet Margin** ๐Ÿ“ - **Small**: Easier triplets, faster convergence - **Large**: Harder triplets, better embeddings - **Balance**: 0.2-0.3 typically optimal - **Try**: 0.1, 0.2, 0.3, 0.5 ### Medium Impact Parameters #### 4. **Embedding Dimension** ๐Ÿ”ข - **Small**: Faster inference, less expressive - **Large**: More expressive, slower inference - **Trade-off**: 512 is good balance - **Try**: 256, 512, 768, 1024 #### 5. **Transformer Layers** ๐Ÿ—๏ธ - **Few**: Faster training, less capacity - **Many**: More capacity, slower training - **Sweet Spot**: 4-8 layers - **Try**: 4, 6, 8, 12 #### 6. **Optimizer Choice** โšก - **AdamW**: Best for most cases (default) - **Adam**: Good alternative - **SGD**: Better generalization, slower convergence - **RMSprop**: Alternative to Adam ### Low Impact Parameters (Fine-tune Last) #### 7. **Weight Decay** ๐Ÿ›ก๏ธ - **Small**: Less regularization - **Large**: More regularization - **Default**: 1e-4 (ResNet), 5e-2 (ViT) #### 8. **Dropout Rate** ๐Ÿ’ง - **Small**: Less regularization - **Large**: More regularization - **Default**: 0.1 for both models #### 9. **Attention Heads** ๐Ÿ‘๏ธ - **Rule**: Should divide embedding dimension evenly - **Default**: 8 heads for 512 dimensions - **Try**: 4, 8, 16 ## ๐Ÿš€ Recommended Parameter Combinations ### Quick Experimentation ```yaml # Fast Training (Low Quality) resnet_epochs: 5 vit_epochs: 10 batch_size: 16 learning_rate: 1e-3 ``` ### Balanced Training ```yaml # Standard Quality (Default) resnet_epochs: 20 vit_epochs: 30 batch_size: 64 learning_rate: 1e-3 triplet_margin: 0.2 ``` ### High Quality Training ```yaml # High Quality (Longer Training) resnet_epochs: 50 vit_epochs: 100 batch_size: 32 learning_rate: 5e-4 triplet_margin: 0.3 warmup_epochs: 5 ``` ### Research Experiments ```yaml # Research Configuration resnet_backbone: resnet101 embedding_dim: 768 transformer_layers: 8 attention_heads: 12 mining_strategy: hardest augmentation_level: aggressive ``` ## ๐Ÿ“Š Parameter Tuning Workflow ### 1. **Baseline Training** ๐Ÿ“ˆ ```bash # Start with default parameters ./scripts/train_item.sh ./scripts/train_outfit.sh ``` ### 2. **Learning Rate Sweep** ๐Ÿ” ```yaml # Test different learning rates learning_rates: [1e-4, 5e-4, 1e-3, 5e-3, 1e-2] epochs: 5 # Quick test ``` ### 3. **Architecture Search** ๐Ÿ—๏ธ ```yaml # Test different model sizes embedding_dims: [256, 512, 768, 1024] transformer_layers: [4, 6, 8, 12] ``` ### 4. **Training Strategy** ๐ŸŽฏ ```yaml # Test different strategies mining_strategies: [random, semi_hard, hardest] augmentation_levels: [minimal, standard, aggressive] ``` ### 5. **Hyperparameter Optimization** โšก ```yaml # Fine-tune best combinations learning_rate: [4e-4, 5e-4, 6e-4] batch_size: [24, 32, 40] triplet_margin: [0.25, 0.3, 0.35] ``` ## ๐Ÿงช Monitoring Training Progress ### Key Metrics to Watch 1. **Training Loss**: Should decrease steadily 2. **Validation Loss**: Should decrease without overfitting 3. **Triplet Accuracy**: Should increase over time 4. **Embedding Quality**: Check with t-SNE visualization ### Early Stopping Signs - Loss plateaus for 5+ epochs - Validation loss increases while training loss decreases - Triplet accuracy stops improving ### Success Indicators - Smooth loss curves - Consistent improvement in metrics - Good generalization (validation โ‰ˆ training) ## ๐Ÿ”ง Advanced Parameter Combinations ### Memory-Constrained Training ```yaml # For limited GPU memory batch_size: 16 embedding_dim: 256 transformer_layers: 4 use_mixed_precision: true channels_last: true ``` ### High-Speed Training ```yaml # For quick iterations epochs: 10 batch_size: 128 learning_rate: 2e-3 warmup_epochs: 1 early_stopping_patience: 5 ``` ### Maximum Quality Training ```yaml # For production models epochs: 100 batch_size: 32 learning_rate: 1e-4 warmup_epochs: 10 early_stopping_patience: 20 mining_strategy: hardest augmentation_level: aggressive ``` ## ๐Ÿ“ Parameter Logging ### Save Your Experiments ```python # Each training run saves: # - Custom config JSON # - Training metrics # - Model checkpoints # - Training logs ``` ### Track Changes ```yaml # Document parameter changes: experiment_001: changes: "Increased embedding_dim from 512 to 768" results: "Better triplet accuracy, slower training" next_steps: "Try reducing learning rate" experiment_002: changes: "Changed mining_strategy to hardest" results: "Harder training, better embeddings" next_steps: "Increase triplet_margin" ``` ## ๐ŸŽฏ Pro Tips ### 1. **Start Simple** ๐Ÿš€ - Begin with default parameters - Change one parameter at a time - Document every change ### 2. **Use Quick Training** โšก - Test parameters with 1-5 epochs first - Validate promising combinations with full training - Save time on bad parameter combinations ### 3. **Monitor Resources** ๐Ÿ’ป - Watch GPU memory usage - Monitor training time per epoch - Balance quality vs. speed ### 4. **Validate Changes** โœ… - Always check validation metrics - Compare with baseline performance - Ensure improvements are consistent ### 5. **Save Everything** ๐Ÿ’พ - Keep all experiment configs - Save intermediate checkpoints - Log training curves and metrics --- **Happy Parameter Tuning! ๐ŸŽ‰** *Remember: The best parameters depend on your specific dataset, hardware, and requirements. Experiment systematically and document everything!*