File size: 5,032 Bytes

---
language: en
license: other
library_name: tensorflow
tags:
- computer-vision
- video-processing
- siamese-network
- match-cut-detection
datasets:
- custom
metrics:
- accuracy
model-index:
- name: siamese_model
  results:
  - task:
      type: image-similarity
      subtype: match-cut-detection
    metrics:
      - type: accuracy
        value: 0.956
        name: Test Accuracy
---: Test Accuracy
---

# Model Card for samanthajmichael/siamese_model.h5

This Siamese neural network model detects match cuts in video sequences by analyzing the visual similarity between frame pairs using optical flow features.

## Model Details

### Model Description

The model uses a Siamese architecture to compare pairs of video frames and determine if they constitute a match cut - a film editing technique where visually similar frames are used to create a seamless transition between scenes. The model processes optical flow representations of video frames to focus on motion patterns rather than raw pixel values.

- **Developed by:** samanthajmichael
- **Model type:** Siamese Neural Network
- **Language(s):** Not applicable (Computer Vision)
- **License:** Not specified
- **Finetuned from model:** EfficientNetB0 (used for initial feature extraction)

### Model Sources
- **Repository:** https://github.com/lasyaEd/ml_project
- **Demo:** Available as a Streamlit application for analyzing YouTube videos

## Uses

### Direct Use

The model can be used to:
1. Detect match cuts in video sequences
2. Find visually similar sections within videos
3. Analyze motion patterns between frame pairs
4. Support video editing and content analysis tasks

### Downstream Use

The model can be integrated into:
- Video editing software for automated transition detection
- Content analysis tools for finding visual patterns
- YouTube video analysis applications (as demonstrated in the provided Streamlit app)
- Film studies tools for analyzing editing techniques

### Out-of-Scope Use

This model is not designed for:
- Real-time video processing
- General object detection or recognition
- Scene classification without motion analysis
- Processing single frames in isolation

## Bias, Risks, and Limitations

- The model's performance depends on the quality of optical flow extraction
- May be sensitive to video resolution and frame rate
- Performance may vary based on video content type and editing style
- Not optimized for real-time processing of high-resolution videos

### Recommendations

Users should:
- Ensure input frames are properly preprocessed to 224x224 resolution
- Use high-quality video sources for best results
- Consider the model's confidence scores when making final decisions
- Validate results in the context of their specific use case

## How to Get Started with the Model

```python
from huggingface_hub import from_pretrained_keras
import tensorflow as tf

# Load the model
model = from_pretrained_keras("samanthajmichael/siamese_model.h5")

# Preprocess your frame pairs (ensure 224x224 resolution)
# frames should be normalized to [0,1]
frame1 = preprocess_frame(frame1)  # Shape: (224, 224, 3)
frame2 = preprocess_frame(frame2)  # Shape: (224, 224, 3)

# Get similarity prediction
prediction = model.predict([np.array([frame1]), np.array([frame2])])
```

## Training Details

### Training Data

- Training set: 14,264 frame pairs
- Test set: 3,566 frame pairs
- Data derived from video frames with optical flow features
- Labels generated based on visual similarity thresholds

### Training Procedure

#### Training Hyperparameters

- **Training regime:** fp32
- Optimizer: Adam
- Loss function: Binary Cross-Entropy
- Batch size: 64
- Early stopping patience: 3
- Input shape: (224, 224, 3)

### Model Architecture

- Base network:
  - Conv2D (32 filters) + ReLU + MaxPooling2D
  - Conv2D (64 filters) + ReLU + MaxPooling2D
  - Conv2D (128 filters) + ReLU + MaxPooling2D
  - Flatten
  - Dense (128 units)
- Similarity computed using absolute difference
- Final dense layer with sigmoid activation

## Evaluation

### Testing Data, Factors & Metrics

- Evaluation performed on 3,566 frame pairs
- Balanced dataset of match and non-match pairs
- Primary metric: Binary classification accuracy

### Results

- Test accuracy: 95.60%
- Test loss: 0.1675
- Model shows strong performance in distinguishing match cuts from non-matches

## Environmental Impact

- Trained on Google Colab
- Training completed in 4 epochs with early stopping
- Relatively lightweight model with 12.9M parameters

## Technical Specifications

### Compute Infrastructure

- Training platform: Google Colab
- GPU requirements: Standard GPU runtime
- Inference can be performed on CPU for smaller workloads

### Model Architecture and Objective

Total parameters: 12,938,561 (49.36 MB)
- All parameters are trainable
- Model objective: Binary classification of frame pair similarity

## Model Card Contact

For questions about the model, please contact samanthajmichael through GitHub or Hugging Face.
---
language:
- en
tags:
- siamese
---