File size: 2,298 Bytes
2d4f1b0 7a76894 2d4f1b0 0db24a7 2d4f1b0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
---
license: mit
datasets:
- abdallahwagih/ucf101-videos
metrics:
- accuracy
base_model:
- google/mobilenet_v2_1.0_224
pipeline_tag: video-classification
tags:
- action-recognition
- cnn-gru
- video-classification
- ucf101
- action
- mobilenetv2
- deep-learning
- pytorch
---
# Action Detection with CNN-GRU on MobileNetV2
## Overview
This model performs human action classification on videos using a CNN-GRU architecture built on top of **MobileNetV2 (1.0, 224)** features and trained on the [UCF101](https://www.kaggle.com/datasets/abdallahwagih/ucf101-videos) dataset.
It is well-suited for recognizing actions from short trimmed video clips.
***
## Model Details
- **Base model:** `google/mobilenet_v2_1.0_224`
- **Architecture:** CNN-GRU

- **Dataset:** UCF101 - Action Recognition Dataset (https://www.kaggle.com/datasets/abdallahwagih/ucf101-videos)
- **Task:** Video Classification (Action Recognition)
- **Metrics:** Accuracy
- **License:** MIT
***
## Usage
### Requirements
```bash
pip install torch torchvision opencv-python
```
### Example Code
```python
from action_model import load_action_model, preprocess_frames, predict_action
import cv2
# Load model
model = load_action_model(model_path="best_model.pt", device="cpu", num_classes=5)
# Read frames from video
cap = cv2.VideoCapture("path_to_video.mp4")
frames = []
while True:
ret, frame = cap.read()
if not ret:
break
frames.append(frame)
cap.release()
# Preprocess frames for model input
clip_tensor = preprocess_frames(frames[:16], seq_len=16, resize=(112,112))
# Predict action
result = predict_action(model, clip_tensor, device="cpu")
print(result)
```
***
## Training & Evaluation
- Trained on UCF101 split 1 with MobileNetV2 backbone.
- Sequence length: 16 frames per clip.
- Metric: Top-1 classification accuracy.
***
## Intended Use & Limitations
**Intended for:**
- Video analytics
- Educational research
- Baseline for video action recognition tasks
**Limitations:**
- Predicts only UCF101 subset classes
- Needs short, trimmed video clips
- Not robust to out-of-domain videos or very low-res input
***
## Tags
`action` 路 `cnn-gru` 路 `video-classification` 路 `ucf101` 路 `mobilenetv2` 路 `deep-learning` 路 `torch` |