--- library_name: transformers tags: - image-classification - vision - avatar - katara datasets: - deepghs/nozomi_standalone_full language: - en metrics: - f1 base_model: - facebook/dinov2-small pipeline_tag: image-classification --- # Model Card for Katara Detector This model identifies whether an image contains Katara from Avatar: The Last Airbender. It achieves 96% accuracy and 96.1% F1 score on the validation set. ## Model Details ### Model Description A binary image classifier that determines if Katara from the animated series "Avatar: The Last Airbender" is present in an image. - **Developed by:** Your Name/Organization - **Model type:** Image Classification - **License:** MIT - **Finetuned from model:** facebook/dinov2-small ## Uses ### Direct Use This model can be used to: - Identify Katara in screenshots or fan art - Filter or categorize ATLA-related image collections - Power fan applications that track character appearances ```python # Use a pipeline as a high-level helper from PIL import Image from transformers import pipeline pipe = pipeline("image-classification", model="lumenggan/katara-detector") image = Image.open("yourimage.png") pipe(image) ``` ### Out-of-Scope Use This model should not be used for: - Critical identification tasks - Monitoring or surveillance purposes - Making judgments about real people ## Training Details ### Training Data The model was trained on a custom dataset of Katara images and non-Katara images from Avatar: The Last Airbender. The dataset was split 80/20 for training and validation. ### Training Procedure The model was fine-tuned from DINOv2-small using the following techniques: - Dropout regularization (rate=0.3) - Weight decay (0.01-0.05) - Cosine learning rate schedule with restarts #### Training Hyperparameters - **Learning rate:** 2e-5 - **Weight decay:** 0.01-0.05 - **Epochs:** 5-15 - **Batch size:** 16 (effective 32 with gradient accumulation) - **Training regime:** fp16 mixed precision ## Evaluation ### Metrics - **Accuracy:** 96.0% - **F1 Score:** 96.1% - **Precision:** 96.8% - **Recall:** 95.5% - **ROC AUC:** 99.4% ## Technical Specifications ### Model Architecture - Base model: facebook/dinov2-with-registers-small - Custom classification head with dropout - Input size: 224x224 RGB images ### Compute Infrastructure - GPU: (e.g., NVIDIA T4, A100, etc.) - Training time: Approximately 1-2 hours ## Model Card Contact https://github.com/unLomTrois/