File size: 5,032 Bytes
7bf31f3
f34edce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7bf31f3
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
---
language: en
license: other
library_name: tensorflow
tags:
- computer-vision
- video-processing
- siamese-network
- match-cut-detection
datasets:
- custom
metrics:
- accuracy
model-index:
- name: siamese_model
  results:
  - task:
      type: image-similarity
      subtype: match-cut-detection
    metrics:
      - type: accuracy
        value: 0.956
        name: Test Accuracy
---: Test Accuracy
---

# Model Card for samanthajmichael/siamese_model.h5

This Siamese neural network model detects match cuts in video sequences by analyzing the visual similarity between frame pairs using optical flow features.

## Model Details

### Model Description

The model uses a Siamese architecture to compare pairs of video frames and determine if they constitute a match cut - a film editing technique where visually similar frames are used to create a seamless transition between scenes. The model processes optical flow representations of video frames to focus on motion patterns rather than raw pixel values.

- **Developed by:** samanthajmichael
- **Model type:** Siamese Neural Network
- **Language(s):** Not applicable (Computer Vision)
- **License:** Not specified
- **Finetuned from model:** EfficientNetB0 (used for initial feature extraction)

### Model Sources
- **Repository:** https://github.com/lasyaEd/ml_project
- **Demo:** Available as a Streamlit application for analyzing YouTube videos

## Uses

### Direct Use

The model can be used to:
1. Detect match cuts in video sequences
2. Find visually similar sections within videos
3. Analyze motion patterns between frame pairs
4. Support video editing and content analysis tasks

### Downstream Use

The model can be integrated into:
- Video editing software for automated transition detection
- Content analysis tools for finding visual patterns
- YouTube video analysis applications (as demonstrated in the provided Streamlit app)
- Film studies tools for analyzing editing techniques

### Out-of-Scope Use

This model is not designed for:
- Real-time video processing
- General object detection or recognition
- Scene classification without motion analysis
- Processing single frames in isolation

## Bias, Risks, and Limitations

- The model's performance depends on the quality of optical flow extraction
- May be sensitive to video resolution and frame rate
- Performance may vary based on video content type and editing style
- Not optimized for real-time processing of high-resolution videos

### Recommendations

Users should:
- Ensure input frames are properly preprocessed to 224x224 resolution
- Use high-quality video sources for best results
- Consider the model's confidence scores when making final decisions
- Validate results in the context of their specific use case

## How to Get Started with the Model

```python
from huggingface_hub import from_pretrained_keras
import tensorflow as tf

# Load the model
model = from_pretrained_keras("samanthajmichael/siamese_model.h5")

# Preprocess your frame pairs (ensure 224x224 resolution)
# frames should be normalized to [0,1]
frame1 = preprocess_frame(frame1)  # Shape: (224, 224, 3)
frame2 = preprocess_frame(frame2)  # Shape: (224, 224, 3)

# Get similarity prediction
prediction = model.predict([np.array([frame1]), np.array([frame2])])
```

## Training Details

### Training Data

- Training set: 14,264 frame pairs
- Test set: 3,566 frame pairs
- Data derived from video frames with optical flow features
- Labels generated based on visual similarity thresholds

### Training Procedure

#### Training Hyperparameters

- **Training regime:** fp32
- Optimizer: Adam
- Loss function: Binary Cross-Entropy
- Batch size: 64
- Early stopping patience: 3
- Input shape: (224, 224, 3)

### Model Architecture

- Base network:
  - Conv2D (32 filters) + ReLU + MaxPooling2D
  - Conv2D (64 filters) + ReLU + MaxPooling2D
  - Conv2D (128 filters) + ReLU + MaxPooling2D
  - Flatten
  - Dense (128 units)
- Similarity computed using absolute difference
- Final dense layer with sigmoid activation

## Evaluation

### Testing Data, Factors & Metrics

- Evaluation performed on 3,566 frame pairs
- Balanced dataset of match and non-match pairs
- Primary metric: Binary classification accuracy

### Results

- Test accuracy: 95.60%
- Test loss: 0.1675
- Model shows strong performance in distinguishing match cuts from non-matches

## Environmental Impact

- Trained on Google Colab
- Training completed in 4 epochs with early stopping
- Relatively lightweight model with 12.9M parameters

## Technical Specifications

### Compute Infrastructure

- Training platform: Google Colab
- GPU requirements: Standard GPU runtime
- Inference can be performed on CPU for smaller workloads

### Model Architecture and Objective

Total parameters: 12,938,561 (49.36 MB)
- All parameters are trainable
- Model objective: Binary classification of frame pair similarity

## Model Card Contact

For questions about the model, please contact samanthajmichael through GitHub or Hugging Face.
---
language:
- en
tags:
- siamese
---