cakelens-v5
Open-source AI-gen video detection model
Please see the blog post and the open-source Python library plus CLI tool cakelens-v5 for more details.
Installation
Install the package with its dependencies:
pip install cakelens-v5
Command Line Interface
The package provides a command line tool cakelens
for easy video detection:
Basic Usage
# Using Hugging Face Hub (recommended)
cakelens video.mp4
# Using local model file
cakelens video.mp4 --model-path model.pt
Options
--model-path
: Path to the model checkpoint file (optional - will load from Hugging Face Hub if not provided)--batch-size
: Batch size for inference (default: 1)--device
: Device to run inference on (cpu
,cuda
,mps
) - auto-detected if not specified--verbose, -v
: Enable verbose logging--output
: Output file path for results (JSON format)
Examples
# Basic detection (uses Hugging Face Hub)
cakelens video.mp4
# Using local model file
cakelens video.mp4 --model-path model.pt
# With custom batch size and device
cakelens video.mp4 --batch-size 4 --device cuda
# Save results to JSON file
cakelens video.mp4 --output results.json
# Verbose output
cakelens video.mp4 --verbose
Output
The tool provides:
- Real-time prediction percentages for each label
- Final mean predictions across all frames
- Option to save results in JSON format
- Detailed logging (with
--verbose
flag)
Programmatic Usage
You can also use the detection functionality programmatically in your Python code:
Basic Detection
import pathlib
from cakelens.detect import Detector
from cakelens.model import Model
# Create model and load from Hugging Face Hub
model = Model()
# load the model weights from Hugging Face Hub
model.load_from_huggingface_hub()
# or, if you have a local model file:
# model.load_state_dict(torch.load("model.pt")["model_state_dict"])
# Create detector
detector = Detector(
model=model,
batch_size=1,
device="cpu" # or "cuda", "mps", or None for auto-detection
)
# Run detection
video_path = pathlib.Path("video.mp4")
verdict = detector.detect(video_path)
# Access results
print(f"Video: {verdict.video_filepath}")
print(f"Frame count: {verdict.frame_count}")
print("Predictions:")
for i, prob in enumerate(verdict.predictions):
print(f" Label {i}: {prob * 100:.2f}%")
Labels
The model can detect the following labels:
- AI_GEN: Is the video AI-generated or not?
- ANIME_1D: Is the video in 2D anime style?
- ANIME_2D: Is the video in 3D anime style?
- VIDEO_GAME: Does the video look like a video game?
- KLING: Is the video generated by Kling?
- HIGGSFIELD: Is the video generated by Higgsfield?
- WAN: Is the video generated by Wan?
- MIDJOURNEY: Is the video generated using images from Midjourney?
- HAILUO: Is the video generated by Hailuo?
- RAY: Is the video generated by Ray?
- VEO: Is the video generated by Veo?
- RUNWAY: Is the video generated by Runway?
- SORA: Is the video generated by Sora?
- CHATGPT: Is the video generated using images from ChatGPT?
- PIKA: Is the video generated by Pika?
- HUNYUAN: Is the video generated by Hunyuan?
- VIDU: Is the video generated by Vidu?
Note: The AI_GEN label is the most accurate as it has the most training data. Other labels have limited training data and may be less accurate.
Accuracy
The PR curve of the model is shown below:
At threshold 0.5, the model has an precision of 0.77 and a recall of 0.74. The dataset size is 5,093 videos for training and 498 videos for validation. Please note that the model is not perfect and may make mistakes.