File size: 7,540 Bytes
2667b42 2ee5f3c 0597aa2 2ee5f3c 2667b42 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
---
# Model Card Metadata (YAML Front Matter)
license: mit
base_model: microsoft/deberta-v3-small
tags:
- text-classification
- character-analysis
- plot-arc
- narrative-analysis
- deberta
- transformers
language: en
datasets:
- custom/plot-arc-balanced-101k
metrics:
- accuracy
- f1
- precision
- recall
model_type: sequence-classification
pipeline_tag: text-classification
widget:
- text: "Sir Galahad embarks on a perilous quest to retrieve the stolen Crown of Ages."
example_title: "External Arc Example"
- text: "Maria struggles with crippling self-doubt after her mother's harsh words."
example_title: "Internal Arc Example"
- text: "Captain Torres must infiltrate enemy lines while battling his own cowardice."
example_title: "Both Arc Example"
- text: "A baker who makes bread every morning in his village shop."
example_title: "No Arc Example"
library_name: transformers
---
# Plot Arc Classifier - DeBERTa Small
A fine-tuned DeBERTa-v3-small model for classifying character plot arc types in narrative text.
## Model Details
### Model Description
This model classifies character descriptions into four plot arc categories:
- **NONE (0)**: No discernible character development or plot arc
- **INTERNAL (1)**: Character growth driven by internal conflict/psychology
- **EXTERNAL (2)**: Character arc driven by external events/missions
- **BOTH (3)**: Character arc with both internal conflict and external drivers
**Model Type:** Text Classification (Sequence Classification)
**Base Model:** microsoft/deberta-v3-small (~60M parameters)
**Language:** English
**License:** MIT
### Model Architecture
- **Base:** DeBERTa-v3-Small (60M parameters)
- **Task:** 4-class sequence classification
- **Input:** Character descriptions (max 512 tokens)
- **Output:** Classification logits + probabilities for 4 classes
## Training Data
### Dataset Statistics
- **Total Examples:** 101,348
- **Training Split:** 91,213 examples (90%)
- **Validation Split:** 10,135 examples (10%)
- **Perfect Class Balance:** 25,337 examples per class
### Data Sources
- Systematic scanning of 1.8M+ character descriptions
- LLM validation using Llama-3.2-3B for quality assurance
- SHA256-based deduplication to prevent data leakage
- Carefully curated and balanced dataset across all plot arc types
### Class Distribution
| Class | Count | Percentage |
|-------|-------|------------|
| NONE | 25,337 | 25% |
| INTERNAL | 25,337 | 25% |
| EXTERNAL | 25,337 | 25% |
| BOTH | 25,337 | 25% |
## Performance
### Key Metrics
- **Accuracy:** 0.7286
- **F1 (Weighted):** 0.7283
- **F1 (Macro):** 0.7275
### Per-Class Performance
| Class | Precision | Recall | F1-Score | Support |
|-------|-----------|--------|----------|---------|
| NONE | 0.697 | 0.613 | 0.653 | 2,495 |
| INTERNAL | 0.677 | 0.683 | 0.680 | 2,571 |
| EXTERNAL | 0.892 | 0.882 | 0.887 | 2,568 |
| BOTH | 0.652 | 0.732 | 0.690 | 2,501 |
### Training Details
- **Training Time:** 9.7 hours on Apple Silicon MPS
- **Final Training Loss:** 0.635
- **Epochs:** 3.86 (early stopping)
- **Batch Size:** 16 (effective: 32 with gradient accumulation)
- **Learning Rate:** 2e-5 with warmup
- **Optimizer:** AdamW with weight decay (0.01)
## Confusion Matrix

## Usage
### Basic Usage
```python
from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification
import torch
# Load model and tokenizer
model_name = "plot-arc-classifier-deberta-small"
tokenizer = DebertaV2Tokenizer.from_pretrained(model_name)
model = DebertaV2ForSequenceClassification.from_pretrained(model_name)
# Example text
text = "Sir Galahad embarks on a perilous quest to retrieve the stolen Crown of Ages."
# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probabilities = torch.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1)
# Class mapping
class_names = ['NONE', 'INTERNAL', 'EXTERNAL', 'BOTH']
prediction = class_names[predicted_class.item()]
confidence = probabilities[0][predicted_class].item()
print(f"Predicted class: {prediction} (confidence: {confidence:.3f})")
```
### Pipeline Usage
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="plot-arc-classifier-deberta-small",
return_all_scores=True
)
result = classifier("Captain Torres must infiltrate enemy lines while battling his own cowardice.")
print(result)
```
## Example Classifications
| Class | Type | Example | Prediction | Confidence |
|-------|------|---------|------------|------------|
| **NONE** | Simple | *"Margaret runs the village bakery, making fresh bread every morning at 5 AM for the past thirty years."* | NONE ✅ | 0.997 |
| **NONE** | Nuanced | *"Dr. Harrison performs routine medical check-ups with methodical precision, maintaining professional distance while patients share their deepest fears about mortality."* | NONE ⚠️ | 0.581 |
| **INTERNAL** | Simple | *"Emma struggles with overwhelming anxiety after her father's harsh criticism, questioning her self-worth and abilities."* | INTERNAL ✅ | 0.983 |
| **INTERNAL** | Nuanced | *"The renowned pianist Clara finds herself paralyzed by perfectionism, her childhood trauma surfacing as she prepares for the performance that could define her legacy."* | INTERNAL ✅ | 0.733 |
| **EXTERNAL** | Simple | *"Knight Roderick embarks on a dangerous quest to retrieve the stolen crown from the dragon's lair."* | EXTERNAL ✅ | 0.717 |
| **EXTERNAL** | Nuanced | *"Master thief Elias infiltrates the heavily guarded fortress, disabling security systems and evading patrol routes, each obstacle requiring new techniques and tools to reach the vault."* | EXTERNAL ✅ | 0.711 |
| **BOTH** | Simple | *"Sarah must rescue her kidnapped daughter from the terrorist compound while confronting her own paralyzing guilt about being an absent mother."* | BOTH ⚠️ | 0.578 |
| **BOTH** | Nuanced | *"Archaeologist Sophia discovers an ancient artifact that could rewrite history, but must confront her own ethical boundaries and childhood abandonment issues as powerful forces try to silence her."* | BOTH ✅ | 0.926 |
**Results:** 8/8 correct predictions (100% accuracy)
## Limitations
- **Domain:** Optimized for character descriptions in narrative fiction
- **Length:** Maximum 512 tokens (longer texts are truncated)
- **Language:** English only
- **Context:** Works best with character-focused descriptions rather than plot summaries
- **Ambiguity:** Some edge cases may be inherently ambiguous between INTERNAL/BOTH
## Ethical Considerations
- **Bias:** Training data may contain genre/cultural biases toward certain character archetypes
- **Interpretation:** Classifications reflect Western narrative theory; other storytelling traditions may not map perfectly
- **Automation:** Should complement, not replace, human literary analysis
## Citation
```bibtex
@model{plot_arc_classifier_2025,
title={Plot Arc Classifier - DeBERTa Small},
author={Claude Code Assistant},
year={2025},
url={https://github.com/your-org/plot-arc-classifier},
note={Fine-tuned DeBERTa-v3-small for character plot arc classification}
}
```
## Model Card Contact
For questions about this model, please open an issue in the repository or contact the maintainers.
---
*Model trained on 2025-09-02 using transformers library.*
|