File size: 2,473 Bytes

4acb6fe
 
f62ee94
 
 
 
 
 
 
 
 
 
 
 
 
 
4acb6fe
 
2d40c7b
4acb6fe
2d40c7b
4acb6fe
 
 
 
 
2d40c7b
4acb6fe
2d40c7b
 
 
4acafc7
4acb6fe
 
 
 
 
2d40c7b
 
 
 
 
 
4acafc7
2d40c7b
4acafc7
 
 
 
 
 
 
2d40c7b
 
4acb6fe
 
 
2d40c7b
 
 
 
4acb6fe
 
 
 
 
2d40c7b
4acb6fe
 
 
4acafc7
2d40c7b
 
 
4acb6fe
 
 
2d40c7b
 
 
 
 
4acb6fe
 
 
2d40c7b
4acb6fe
2d40c7b
 
 
 
 
4acb6fe
2d40c7b
4acb6fe
2d40c7b
4acb6fe
2d40c7b
 
 
4acb6fe
 
 
2d40c7b
 
4acb6fe
 
 
4acafc7

---
library_name: transformers
tags:
- image-classification
- vision
- avatar
- katara
datasets:
- deepghs/nozomi_standalone_full
language:
- en
metrics:
- f1
base_model:
- facebook/dinov2-small
pipeline_tag: image-classification
---

# Model Card for Katara Detector

This model identifies whether an image contains Katara from Avatar: The Last Airbender. It achieves 96% accuracy and 96.1% F1 score on the validation set.

## Model Details

### Model Description

A binary image classifier that determines if Katara from the animated series "Avatar: The Last Airbender" is present in an image.

- **Developed by:** Your Name/Organization
- **Model type:** Image Classification
- **License:** MIT
- **Finetuned from model:** facebook/dinov2-small

## Uses

### Direct Use

This model can be used to:
- Identify Katara in screenshots or fan art
- Filter or categorize ATLA-related image collections
- Power fan applications that track character appearances

```python
# Use a pipeline as a high-level helper
from PIL import Image
from transformers import pipeline

pipe = pipeline("image-classification", model="lumenggan/katara-detector")

image = Image.open("yourimage.png")

pipe(image)

```

### Out-of-Scope Use

This model should not be used for:
- Critical identification tasks
- Monitoring or surveillance purposes
- Making judgments about real people

## Training Details

### Training Data

The model was trained on a custom dataset of Katara images and non-Katara images from Avatar: The Last Airbender. The dataset was split 80/20 for training and validation.

### Training Procedure

The model was fine-tuned from DINOv2-small using the following techniques:
- Dropout regularization (rate=0.3)
- Weight decay (0.01-0.05)
- Cosine learning rate schedule with restarts

#### Training Hyperparameters

- **Learning rate:** 2e-5
- **Weight decay:** 0.01-0.05
- **Epochs:** 5-15
- **Batch size:** 16 (effective 32 with gradient accumulation)
- **Training regime:** fp16 mixed precision

## Evaluation

### Metrics

- **Accuracy:** 96.0%
- **F1 Score:** 96.1%
- **Precision:** 96.8%
- **Recall:** 95.5%
- **ROC AUC:** 99.4%

## Technical Specifications

### Model Architecture

- Base model: facebook/dinov2-with-registers-small
- Custom classification head with dropout
- Input size: 224x224 RGB images

### Compute Infrastructure

- GPU: (e.g., NVIDIA T4, A100, etc.)
- Training time: Approximately 1-2 hours

## Model Card Contact

https://github.com/unLomTrois/