File size: 2,298 Bytes

---
license: mit
datasets:
- abdallahwagih/ucf101-videos
metrics:
- accuracy
base_model:
- google/mobilenet_v2_1.0_224
pipeline_tag: video-classification

tags:
- action-recognition
- cnn-gru
- video-classification
- ucf101
- action
- mobilenetv2
- deep-learning
- pytorch
---

# Action Detection with CNN-GRU on MobileNetV2

## Overview

This model performs human action classification on videos using a CNN-GRU architecture built on top of **MobileNetV2 (1.0, 224)** features and trained on the [UCF101](https://www.kaggle.com/datasets/abdallahwagih/ucf101-videos) dataset.  
It is well-suited for recognizing actions from short trimmed video clips.

***

## Model Details

- **Base model:** `google/mobilenet_v2_1.0_224`
- **Architecture:** CNN-GRU

  ![CNN-GRU Architecture](./cnn_architecture.png)

- **Dataset:** UCF101 - Action Recognition Dataset (https://www.kaggle.com/datasets/abdallahwagih/ucf101-videos)
- **Task:** Video Classification (Action Recognition)
- **Metrics:** Accuracy
- **License:** MIT

***

## Usage

### Requirements

```bash
pip install torch torchvision opencv-python
```

### Example Code

```python
from action_model import load_action_model, preprocess_frames, predict_action
import cv2

# Load model
model = load_action_model(model_path="best_model.pt", device="cpu", num_classes=5)

# Read frames from video
cap = cv2.VideoCapture("path_to_video.mp4")
frames = []
while True:
    ret, frame = cap.read()
    if not ret:
        break
    frames.append(frame)
cap.release()

# Preprocess frames for model input
clip_tensor = preprocess_frames(frames[:16], seq_len=16, resize=(112,112))

# Predict action
result = predict_action(model, clip_tensor, device="cpu")
print(result)
```

***

## Training & Evaluation

- Trained on UCF101 split 1 with MobileNetV2 backbone.
- Sequence length: 16 frames per clip.
- Metric: Top-1 classification accuracy.

***

## Intended Use & Limitations

**Intended for:**
- Video analytics
- Educational research
- Baseline for video action recognition tasks

**Limitations:**
- Predicts only UCF101 subset classes
- Needs short, trimmed video clips
- Not robust to out-of-domain videos or very low-res input

***

## Tags

`action` · `cnn-gru` · `video-classification` · `ucf101` · `mobilenetv2` · `deep-learning` · `torch`