---
library_name: transformers
tags:
- image-classification
- vision
- avatar
- katara
datasets:
- deepghs/nozomi_standalone_full
language:
- en
metrics:
- f1
base_model:
- facebook/dinov2-small
pipeline_tag: image-classification
---

# Model Card for Katara Detector

This model identifies whether an image contains Katara from Avatar: The Last Airbender. It achieves 96% accuracy and 96.1% F1 score on the validation set.

## Model Details

### Model Description

A binary image classifier that determines if Katara from the animated series "Avatar: The Last Airbender" is present in an image.

- **Developed by:** Your Name/Organization
- **Model type:** Image Classification
- **License:** MIT
- **Finetuned from model:** facebook/dinov2-small

## Uses

### Direct Use

This model can be used to:
- Identify Katara in screenshots or fan art
- Filter or categorize ATLA-related image collections
- Power fan applications that track character appearances

```python
# Use a pipeline as a high-level helper
from PIL import Image
from transformers import pipeline

pipe = pipeline("image-classification", model="lumenggan/katara-detector")

image = Image.open("yourimage.png")

pipe(image)

```

### Out-of-Scope Use

This model should not be used for:
- Critical identification tasks
- Monitoring or surveillance purposes
- Making judgments about real people

## Training Details

### Training Data

The model was trained on a custom dataset of Katara images and non-Katara images from Avatar: The Last Airbender. The dataset was split 80/20 for training and validation.

### Training Procedure

The model was fine-tuned from DINOv2-small using the following techniques:
- Dropout regularization (rate=0.3)
- Weight decay (0.01-0.05)
- Cosine learning rate schedule with restarts

#### Training Hyperparameters

- **Learning rate:** 2e-5
- **Weight decay:** 0.01-0.05
- **Epochs:** 5-15
- **Batch size:** 16 (effective 32 with gradient accumulation)
- **Training regime:** fp16 mixed precision

## Evaluation

### Metrics

- **Accuracy:** 96.0%
- **F1 Score:** 96.1%
- **Precision:** 96.8%
- **Recall:** 95.5%
- **ROC AUC:** 99.4%

## Technical Specifications

### Model Architecture

- Base model: facebook/dinov2-with-registers-small
- Custom classification head with dropout
- Input size: 224x224 RGB images

### Compute Infrastructure

- GPU: (e.g., NVIDIA T4, A100, etc.)
- Training time: Approximately 1-2 hours

## Model Card Contact

https://github.com/unLomTrois/