--- license: mit datasets: - abdallahwagih/ucf101-videos metrics: - accuracy base_model: - google/mobilenet_v2_1.0_224 pipeline_tag: video-classification tags: - action-recognition - cnn-gru - video-classification - ucf101 - action - mobilenetv2 - deep-learning - pytorch --- # Action Detection with CNN-GRU on MobileNetV2 ## Overview This model performs human action classification on videos using a CNN-GRU architecture built on top of **MobileNetV2 (1.0, 224)** features and trained on the [UCF101](https://www.kaggle.com/datasets/abdallahwagih/ucf101-videos) dataset. It is well-suited for recognizing actions from short trimmed video clips. *** ## Model Details - **Base model:** `google/mobilenet_v2_1.0_224` - **Architecture:** CNN-GRU ![CNN-GRU Architecture](./cnn_architecture.png) - **Dataset:** UCF101 - Action Recognition Dataset (https://www.kaggle.com/datasets/abdallahwagih/ucf101-videos) - **Task:** Video Classification (Action Recognition) - **Metrics:** Accuracy - **License:** MIT *** ## Usage ### Requirements ```bash pip install torch torchvision opencv-python ``` ### Example Code ```python from action_model import load_action_model, preprocess_frames, predict_action import cv2 # Load model model = load_action_model(model_path="best_model.pt", device="cpu", num_classes=5) # Read frames from video cap = cv2.VideoCapture("path_to_video.mp4") frames = [] while True: ret, frame = cap.read() if not ret: break frames.append(frame) cap.release() # Preprocess frames for model input clip_tensor = preprocess_frames(frames[:16], seq_len=16, resize=(112,112)) # Predict action result = predict_action(model, clip_tensor, device="cpu") print(result) ``` *** ## Training & Evaluation - Trained on UCF101 split 1 with MobileNetV2 backbone. - Sequence length: 16 frames per clip. - Metric: Top-1 classification accuracy. *** ## Intended Use & Limitations **Intended for:** - Video analytics - Educational research - Baseline for video action recognition tasks **Limitations:** - Predicts only UCF101 subset classes - Needs short, trimmed video clips - Not robust to out-of-domain videos or very low-res input *** ## Tags `action` · `cnn-gru` · `video-classification` · `ucf101` · `mobilenetv2` · `deep-learning` · `torch`