---
license: mit
pipeline_tag: video-classification
---

# cakelens-v5 
Open-source AI-gen video detection model

Please see the [blog post](https://fangpenlin.com/posts/2025/07/30/open-source-cakelens-v5/) and the open-source Python library plus CLI tool [cakelens-v5](https://github.com/LaunchPlatform/cakelens-v5) for more details.

## Installation

Install the package with its dependencies:

```bash
pip install cakelens-v5
```

## Command Line Interface

The package provides a command line tool `cakelens` for easy video detection:

### Basic Usage

```bash
# Using Hugging Face Hub (recommended)
cakelens video.mp4

# Using local model file
cakelens video.mp4 --model-path model.pt
```

### Options

- `--model-path`: Path to the model checkpoint file (optional - will load from Hugging Face Hub if not provided)
- `--batch-size`: Batch size for inference (default: 1)
- `--device`: Device to run inference on (`cpu`, `cuda`, `mps`) - auto-detected if not specified
- `--verbose, -v`: Enable verbose logging
- `--output`: Output file path for results (JSON format)

### Examples

```bash
# Basic detection (uses Hugging Face Hub)
cakelens video.mp4

# Using local model file
cakelens video.mp4 --model-path model.pt

# With custom batch size and device
cakelens video.mp4 --batch-size 4 --device cuda

# Save results to JSON file
cakelens video.mp4 --output results.json

# Verbose output
cakelens video.mp4 --verbose
```

### Output

The tool provides:
- Real-time prediction percentages for each label
- Final mean predictions across all frames
- Option to save results in JSON format
- Detailed logging (with `--verbose` flag)

## Programmatic Usage

You can also use the detection functionality programmatically in your Python code:

### Basic Detection

```python
import pathlib
from cakelens.detect import Detector
from cakelens.model import Model

# Create model and load from Hugging Face Hub
model = Model()
# load the model weights from Hugging Face Hub
model.load_from_huggingface_hub()
# or, if you have a local model file:
# model.load_state_dict(torch.load("model.pt")["model_state_dict"])

# Create detector
detector = Detector(
    model=model,
    batch_size=1,
    device="cpu"  # or "cuda", "mps", or None for auto-detection
)

# Run detection
video_path = pathlib.Path("video.mp4")
verdict = detector.detect(video_path)

# Access results
print(f"Video: {verdict.video_filepath}")
print(f"Frame count: {verdict.frame_count}")
print("Predictions:")
for i, prob in enumerate(verdict.predictions):
    print(f"  Label {i}: {prob * 100:.2f}%")
```

## Labels

The model can detect the following labels:

- **AI_GEN**: Is the video AI-generated or not?
- **ANIME_1D**: Is the video in 2D anime style?
- **ANIME_2D**: Is the video in 3D anime style?
- **VIDEO_GAME**: Does the video look like a video game?
- **KLING**: Is the video generated by Kling?
- **HIGGSFIELD**: Is the video generated by Higgsfield?
- **WAN**: Is the video generated by Wan?
- **MIDJOURNEY**: Is the video generated using images from Midjourney?
- **HAILUO**: Is the video generated by Hailuo?
- **RAY**: Is the video generated by Ray?
- **VEO**: Is the video generated by Veo?
- **RUNWAY**: Is the video generated by Runway?
- **SORA**: Is the video generated by Sora?
- **CHATGPT**: Is the video generated using images from ChatGPT?
- **PIKA**: Is the video generated by Pika?
- **HUNYUAN**: Is the video generated by Hunyuan?
- **VIDU**: Is the video generated by Vidu?

> **Note**: The **AI_GEN** label is the most accurate as it has the most training data. Other labels have limited training data and may be less accurate.

## Accuracy

The PR curve of the model is shown below:

<p align="center">
  <img src="https://github.com/LaunchPlatform/cakelens-v5/raw/master/assets/pr-curve.png?raw=true" alt="PR Curve" />
</p>

At threshold 0.5, the model has an precision of 0.77 and a recall of 0.74.
The dataset size is 5,093 videos for training and 498 videos for validation.
Please note that the model is not perfect and may make mistakes.