samanthajmichael commited on
Commit
f34edce
·
verified ·
1 Parent(s): 7bf31f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +173 -0
README.md CHANGED
@@ -1,4 +1,177 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  language:
3
  - en
4
  tags:
 
1
  ---
2
+ language: en
3
+ license: other
4
+ library_name: tensorflow
5
+ tags:
6
+ - computer-vision
7
+ - video-processing
8
+ - siamese-network
9
+ - match-cut-detection
10
+ datasets:
11
+ - custom
12
+ metrics:
13
+ - accuracy
14
+ model-index:
15
+ - name: siamese_model
16
+ results:
17
+ - task:
18
+ type: image-similarity
19
+ subtype: match-cut-detection
20
+ metrics:
21
+ - type: accuracy
22
+ value: 0.956
23
+ name: Test Accuracy
24
+ ---: Test Accuracy
25
+ ---
26
+
27
+ # Model Card for samanthajmichael/siamese_model.h5
28
+
29
+ This Siamese neural network model detects match cuts in video sequences by analyzing the visual similarity between frame pairs using optical flow features.
30
+
31
+ ## Model Details
32
+
33
+ ### Model Description
34
+
35
+ The model uses a Siamese architecture to compare pairs of video frames and determine if they constitute a match cut - a film editing technique where visually similar frames are used to create a seamless transition between scenes. The model processes optical flow representations of video frames to focus on motion patterns rather than raw pixel values.
36
+
37
+ - **Developed by:** samanthajmichael
38
+ - **Model type:** Siamese Neural Network
39
+ - **Language(s):** Not applicable (Computer Vision)
40
+ - **License:** Not specified
41
+ - **Finetuned from model:** EfficientNetB0 (used for initial feature extraction)
42
+
43
+ ### Model Sources
44
+ - **Repository:** https://github.com/lasyaEd/ml_project
45
+ - **Demo:** Available as a Streamlit application for analyzing YouTube videos
46
+
47
+ ## Uses
48
+
49
+ ### Direct Use
50
+
51
+ The model can be used to:
52
+ 1. Detect match cuts in video sequences
53
+ 2. Find visually similar sections within videos
54
+ 3. Analyze motion patterns between frame pairs
55
+ 4. Support video editing and content analysis tasks
56
+
57
+ ### Downstream Use
58
+
59
+ The model can be integrated into:
60
+ - Video editing software for automated transition detection
61
+ - Content analysis tools for finding visual patterns
62
+ - YouTube video analysis applications (as demonstrated in the provided Streamlit app)
63
+ - Film studies tools for analyzing editing techniques
64
+
65
+ ### Out-of-Scope Use
66
+
67
+ This model is not designed for:
68
+ - Real-time video processing
69
+ - General object detection or recognition
70
+ - Scene classification without motion analysis
71
+ - Processing single frames in isolation
72
+
73
+ ## Bias, Risks, and Limitations
74
+
75
+ - The model's performance depends on the quality of optical flow extraction
76
+ - May be sensitive to video resolution and frame rate
77
+ - Performance may vary based on video content type and editing style
78
+ - Not optimized for real-time processing of high-resolution videos
79
+
80
+ ### Recommendations
81
+
82
+ Users should:
83
+ - Ensure input frames are properly preprocessed to 224x224 resolution
84
+ - Use high-quality video sources for best results
85
+ - Consider the model's confidence scores when making final decisions
86
+ - Validate results in the context of their specific use case
87
+
88
+ ## How to Get Started with the Model
89
+
90
+ ```python
91
+ from huggingface_hub import from_pretrained_keras
92
+ import tensorflow as tf
93
+
94
+ # Load the model
95
+ model = from_pretrained_keras("samanthajmichael/siamese_model.h5")
96
+
97
+ # Preprocess your frame pairs (ensure 224x224 resolution)
98
+ # frames should be normalized to [0,1]
99
+ frame1 = preprocess_frame(frame1) # Shape: (224, 224, 3)
100
+ frame2 = preprocess_frame(frame2) # Shape: (224, 224, 3)
101
+
102
+ # Get similarity prediction
103
+ prediction = model.predict([np.array([frame1]), np.array([frame2])])
104
+ ```
105
+
106
+ ## Training Details
107
+
108
+ ### Training Data
109
+
110
+ - Training set: 14,264 frame pairs
111
+ - Test set: 3,566 frame pairs
112
+ - Data derived from video frames with optical flow features
113
+ - Labels generated based on visual similarity thresholds
114
+
115
+ ### Training Procedure
116
+
117
+ #### Training Hyperparameters
118
+
119
+ - **Training regime:** fp32
120
+ - Optimizer: Adam
121
+ - Loss function: Binary Cross-Entropy
122
+ - Batch size: 64
123
+ - Early stopping patience: 3
124
+ - Input shape: (224, 224, 3)
125
+
126
+ ### Model Architecture
127
+
128
+ - Base network:
129
+ - Conv2D (32 filters) + ReLU + MaxPooling2D
130
+ - Conv2D (64 filters) + ReLU + MaxPooling2D
131
+ - Conv2D (128 filters) + ReLU + MaxPooling2D
132
+ - Flatten
133
+ - Dense (128 units)
134
+ - Similarity computed using absolute difference
135
+ - Final dense layer with sigmoid activation
136
+
137
+ ## Evaluation
138
+
139
+ ### Testing Data, Factors & Metrics
140
+
141
+ - Evaluation performed on 3,566 frame pairs
142
+ - Balanced dataset of match and non-match pairs
143
+ - Primary metric: Binary classification accuracy
144
+
145
+ ### Results
146
+
147
+ - Test accuracy: 95.60%
148
+ - Test loss: 0.1675
149
+ - Model shows strong performance in distinguishing match cuts from non-matches
150
+
151
+ ## Environmental Impact
152
+
153
+ - Trained on Google Colab
154
+ - Training completed in 4 epochs with early stopping
155
+ - Relatively lightweight model with 12.9M parameters
156
+
157
+ ## Technical Specifications
158
+
159
+ ### Compute Infrastructure
160
+
161
+ - Training platform: Google Colab
162
+ - GPU requirements: Standard GPU runtime
163
+ - Inference can be performed on CPU for smaller workloads
164
+
165
+ ### Model Architecture and Objective
166
+
167
+ Total parameters: 12,938,561 (49.36 MB)
168
+ - All parameters are trainable
169
+ - Model objective: Binary classification of frame pair similarity
170
+
171
+ ## Model Card Contact
172
+
173
+ For questions about the model, please contact samanthajmichael through GitHub or Hugging Face.
174
+ ---
175
  language:
176
  - en
177
  tags: