File size: 5,032 Bytes
7bf31f3 f34edce 7bf31f3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
---
language: en
license: other
library_name: tensorflow
tags:
- computer-vision
- video-processing
- siamese-network
- match-cut-detection
datasets:
- custom
metrics:
- accuracy
model-index:
- name: siamese_model
results:
- task:
type: image-similarity
subtype: match-cut-detection
metrics:
- type: accuracy
value: 0.956
name: Test Accuracy
---: Test Accuracy
---
# Model Card for samanthajmichael/siamese_model.h5
This Siamese neural network model detects match cuts in video sequences by analyzing the visual similarity between frame pairs using optical flow features.
## Model Details
### Model Description
The model uses a Siamese architecture to compare pairs of video frames and determine if they constitute a match cut - a film editing technique where visually similar frames are used to create a seamless transition between scenes. The model processes optical flow representations of video frames to focus on motion patterns rather than raw pixel values.
- **Developed by:** samanthajmichael
- **Model type:** Siamese Neural Network
- **Language(s):** Not applicable (Computer Vision)
- **License:** Not specified
- **Finetuned from model:** EfficientNetB0 (used for initial feature extraction)
### Model Sources
- **Repository:** https://github.com/lasyaEd/ml_project
- **Demo:** Available as a Streamlit application for analyzing YouTube videos
## Uses
### Direct Use
The model can be used to:
1. Detect match cuts in video sequences
2. Find visually similar sections within videos
3. Analyze motion patterns between frame pairs
4. Support video editing and content analysis tasks
### Downstream Use
The model can be integrated into:
- Video editing software for automated transition detection
- Content analysis tools for finding visual patterns
- YouTube video analysis applications (as demonstrated in the provided Streamlit app)
- Film studies tools for analyzing editing techniques
### Out-of-Scope Use
This model is not designed for:
- Real-time video processing
- General object detection or recognition
- Scene classification without motion analysis
- Processing single frames in isolation
## Bias, Risks, and Limitations
- The model's performance depends on the quality of optical flow extraction
- May be sensitive to video resolution and frame rate
- Performance may vary based on video content type and editing style
- Not optimized for real-time processing of high-resolution videos
### Recommendations
Users should:
- Ensure input frames are properly preprocessed to 224x224 resolution
- Use high-quality video sources for best results
- Consider the model's confidence scores when making final decisions
- Validate results in the context of their specific use case
## How to Get Started with the Model
```python
from huggingface_hub import from_pretrained_keras
import tensorflow as tf
# Load the model
model = from_pretrained_keras("samanthajmichael/siamese_model.h5")
# Preprocess your frame pairs (ensure 224x224 resolution)
# frames should be normalized to [0,1]
frame1 = preprocess_frame(frame1) # Shape: (224, 224, 3)
frame2 = preprocess_frame(frame2) # Shape: (224, 224, 3)
# Get similarity prediction
prediction = model.predict([np.array([frame1]), np.array([frame2])])
```
## Training Details
### Training Data
- Training set: 14,264 frame pairs
- Test set: 3,566 frame pairs
- Data derived from video frames with optical flow features
- Labels generated based on visual similarity thresholds
### Training Procedure
#### Training Hyperparameters
- **Training regime:** fp32
- Optimizer: Adam
- Loss function: Binary Cross-Entropy
- Batch size: 64
- Early stopping patience: 3
- Input shape: (224, 224, 3)
### Model Architecture
- Base network:
- Conv2D (32 filters) + ReLU + MaxPooling2D
- Conv2D (64 filters) + ReLU + MaxPooling2D
- Conv2D (128 filters) + ReLU + MaxPooling2D
- Flatten
- Dense (128 units)
- Similarity computed using absolute difference
- Final dense layer with sigmoid activation
## Evaluation
### Testing Data, Factors & Metrics
- Evaluation performed on 3,566 frame pairs
- Balanced dataset of match and non-match pairs
- Primary metric: Binary classification accuracy
### Results
- Test accuracy: 95.60%
- Test loss: 0.1675
- Model shows strong performance in distinguishing match cuts from non-matches
## Environmental Impact
- Trained on Google Colab
- Training completed in 4 epochs with early stopping
- Relatively lightweight model with 12.9M parameters
## Technical Specifications
### Compute Infrastructure
- Training platform: Google Colab
- GPU requirements: Standard GPU runtime
- Inference can be performed on CPU for smaller workloads
### Model Architecture and Objective
Total parameters: 12,938,561 (49.36 MB)
- All parameters are trainable
- Model objective: Binary classification of frame pair similarity
## Model Card Contact
For questions about the model, please contact samanthajmichael through GitHub or Hugging Face.
---
language:
- en
tags:
- siamese
--- |