Mitchins's picture
Upload folder using huggingface_hub
0597aa2 verified
|
raw
history blame
7.54 kB
---
# Model Card Metadata (YAML Front Matter)
license: mit
base_model: microsoft/deberta-v3-small
tags:
- text-classification
- character-analysis
- plot-arc
- narrative-analysis
- deberta
- transformers
language: en
datasets:
- custom/plot-arc-balanced-101k
metrics:
- accuracy
- f1
- precision
- recall
model_type: sequence-classification
pipeline_tag: text-classification
widget:
- text: "Sir Galahad embarks on a perilous quest to retrieve the stolen Crown of Ages."
example_title: "External Arc Example"
- text: "Maria struggles with crippling self-doubt after her mother's harsh words."
example_title: "Internal Arc Example"
- text: "Captain Torres must infiltrate enemy lines while battling his own cowardice."
example_title: "Both Arc Example"
- text: "A baker who makes bread every morning in his village shop."
example_title: "No Arc Example"
library_name: transformers
---
# Plot Arc Classifier - DeBERTa Small
A fine-tuned DeBERTa-v3-small model for classifying character plot arc types in narrative text.
## Model Details
### Model Description
This model classifies character descriptions into four plot arc categories:
- **NONE (0)**: No discernible character development or plot arc
- **INTERNAL (1)**: Character growth driven by internal conflict/psychology
- **EXTERNAL (2)**: Character arc driven by external events/missions
- **BOTH (3)**: Character arc with both internal conflict and external drivers
**Model Type:** Text Classification (Sequence Classification)
**Base Model:** microsoft/deberta-v3-small (~60M parameters)
**Language:** English
**License:** MIT
### Model Architecture
- **Base:** DeBERTa-v3-Small (60M parameters)
- **Task:** 4-class sequence classification
- **Input:** Character descriptions (max 512 tokens)
- **Output:** Classification logits + probabilities for 4 classes
## Training Data
### Dataset Statistics
- **Total Examples:** 101,348
- **Training Split:** 91,213 examples (90%)
- **Validation Split:** 10,135 examples (10%)
- **Perfect Class Balance:** 25,337 examples per class
### Data Sources
- Systematic scanning of 1.8M+ character descriptions
- LLM validation using Llama-3.2-3B for quality assurance
- SHA256-based deduplication to prevent data leakage
- Carefully curated and balanced dataset across all plot arc types
### Class Distribution
| Class | Count | Percentage |
|-------|-------|------------|
| NONE | 25,337 | 25% |
| INTERNAL | 25,337 | 25% |
| EXTERNAL | 25,337 | 25% |
| BOTH | 25,337 | 25% |
## Performance
### Key Metrics
- **Accuracy:** 0.7286
- **F1 (Weighted):** 0.7283
- **F1 (Macro):** 0.7275
### Per-Class Performance
| Class | Precision | Recall | F1-Score | Support |
|-------|-----------|--------|----------|---------|
| NONE | 0.697 | 0.613 | 0.653 | 2,495 |
| INTERNAL | 0.677 | 0.683 | 0.680 | 2,571 |
| EXTERNAL | 0.892 | 0.882 | 0.887 | 2,568 |
| BOTH | 0.652 | 0.732 | 0.690 | 2,501 |
### Training Details
- **Training Time:** 9.7 hours on Apple Silicon MPS
- **Final Training Loss:** 0.635
- **Epochs:** 3.86 (early stopping)
- **Batch Size:** 16 (effective: 32 with gradient accumulation)
- **Learning Rate:** 2e-5 with warmup
- **Optimizer:** AdamW with weight decay (0.01)
## Confusion Matrix
![Confusion Matrix](images/confusion_matrix.png)
## Usage
### Basic Usage
```python
from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification
import torch
# Load model and tokenizer
model_name = "plot-arc-classifier-deberta-small"
tokenizer = DebertaV2Tokenizer.from_pretrained(model_name)
model = DebertaV2ForSequenceClassification.from_pretrained(model_name)
# Example text
text = "Sir Galahad embarks on a perilous quest to retrieve the stolen Crown of Ages."
# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probabilities = torch.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1)
# Class mapping
class_names = ['NONE', 'INTERNAL', 'EXTERNAL', 'BOTH']
prediction = class_names[predicted_class.item()]
confidence = probabilities[0][predicted_class].item()
print(f"Predicted class: {prediction} (confidence: {confidence:.3f})")
```
### Pipeline Usage
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="plot-arc-classifier-deberta-small",
return_all_scores=True
)
result = classifier("Captain Torres must infiltrate enemy lines while battling his own cowardice.")
print(result)
```
## Example Classifications
| Class | Type | Example | Prediction | Confidence |
|-------|------|---------|------------|------------|
| **NONE** | Simple | *"Margaret runs the village bakery, making fresh bread every morning at 5 AM for the past thirty years."* | NONE ✅ | 0.997 |
| **NONE** | Nuanced | *"Dr. Harrison performs routine medical check-ups with methodical precision, maintaining professional distance while patients share their deepest fears about mortality."* | NONE ⚠️ | 0.581 |
| **INTERNAL** | Simple | *"Emma struggles with overwhelming anxiety after her father's harsh criticism, questioning her self-worth and abilities."* | INTERNAL ✅ | 0.983 |
| **INTERNAL** | Nuanced | *"The renowned pianist Clara finds herself paralyzed by perfectionism, her childhood trauma surfacing as she prepares for the performance that could define her legacy."* | INTERNAL ✅ | 0.733 |
| **EXTERNAL** | Simple | *"Knight Roderick embarks on a dangerous quest to retrieve the stolen crown from the dragon's lair."* | EXTERNAL ✅ | 0.717 |
| **EXTERNAL** | Nuanced | *"Master thief Elias infiltrates the heavily guarded fortress, disabling security systems and evading patrol routes, each obstacle requiring new techniques and tools to reach the vault."* | EXTERNAL ✅ | 0.711 |
| **BOTH** | Simple | *"Sarah must rescue her kidnapped daughter from the terrorist compound while confronting her own paralyzing guilt about being an absent mother."* | BOTH ⚠️ | 0.578 |
| **BOTH** | Nuanced | *"Archaeologist Sophia discovers an ancient artifact that could rewrite history, but must confront her own ethical boundaries and childhood abandonment issues as powerful forces try to silence her."* | BOTH ✅ | 0.926 |
**Results:** 8/8 correct predictions (100% accuracy)
## Limitations
- **Domain:** Optimized for character descriptions in narrative fiction
- **Length:** Maximum 512 tokens (longer texts are truncated)
- **Language:** English only
- **Context:** Works best with character-focused descriptions rather than plot summaries
- **Ambiguity:** Some edge cases may be inherently ambiguous between INTERNAL/BOTH
## Ethical Considerations
- **Bias:** Training data may contain genre/cultural biases toward certain character archetypes
- **Interpretation:** Classifications reflect Western narrative theory; other storytelling traditions may not map perfectly
- **Automation:** Should complement, not replace, human literary analysis
## Citation
```bibtex
@model{plot_arc_classifier_2025,
title={Plot Arc Classifier - DeBERTa Small},
author={Claude Code Assistant},
year={2025},
url={https://github.com/your-org/plot-arc-classifier},
note={Fine-tuned DeBERTa-v3-small for character plot arc classification}
}
```
## Model Card Contact
For questions about this model, please open an issue in the repository or contact the maintainers.
---
*Model trained on 2025-09-02 using transformers library.*