File size: 7,540 Bytes
2667b42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2ee5f3c
 
0597aa2
 
 
 
 
 
 
 
 
 
 
 
2ee5f3c
2667b42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
---
# Model Card Metadata (YAML Front Matter)
license: mit
base_model: microsoft/deberta-v3-small
tags:
  - text-classification
  - character-analysis
  - plot-arc
  - narrative-analysis
  - deberta
  - transformers
language: en
datasets:
  - custom/plot-arc-balanced-101k
metrics:
  - accuracy
  - f1
  - precision
  - recall
model_type: sequence-classification
pipeline_tag: text-classification
widget:
  - text: "Sir Galahad embarks on a perilous quest to retrieve the stolen Crown of Ages."
    example_title: "External Arc Example"
  - text: "Maria struggles with crippling self-doubt after her mother's harsh words."
    example_title: "Internal Arc Example"
  - text: "Captain Torres must infiltrate enemy lines while battling his own cowardice."
    example_title: "Both Arc Example"
  - text: "A baker who makes bread every morning in his village shop."
    example_title: "No Arc Example"
library_name: transformers
---

# Plot Arc Classifier - DeBERTa Small

A fine-tuned DeBERTa-v3-small model for classifying character plot arc types in narrative text.

## Model Details

### Model Description

This model classifies character descriptions into four plot arc categories:
- **NONE (0)**: No discernible character development or plot arc
- **INTERNAL (1)**: Character growth driven by internal conflict/psychology  
- **EXTERNAL (2)**: Character arc driven by external events/missions
- **BOTH (3)**: Character arc with both internal conflict and external drivers

**Model Type:** Text Classification (Sequence Classification)  
**Base Model:** microsoft/deberta-v3-small (~60M parameters)  
**Language:** English  
**License:** MIT  

### Model Architecture

- **Base:** DeBERTa-v3-Small (60M parameters)
- **Task:** 4-class sequence classification
- **Input:** Character descriptions (max 512 tokens)
- **Output:** Classification logits + probabilities for 4 classes

## Training Data

### Dataset Statistics
- **Total Examples:** 101,348
- **Training Split:** 91,213 examples (90%)
- **Validation Split:** 10,135 examples (10%)
- **Perfect Class Balance:** 25,337 examples per class

### Data Sources
- Systematic scanning of 1.8M+ character descriptions  
- LLM validation using Llama-3.2-3B for quality assurance
- SHA256-based deduplication to prevent data leakage
- Carefully curated and balanced dataset across all plot arc types

### Class Distribution
| Class | Count | Percentage |
|-------|-------|------------|
| NONE | 25,337 | 25% |
| INTERNAL | 25,337 | 25% |
| EXTERNAL | 25,337 | 25% |
| BOTH | 25,337 | 25% |

## Performance

### Key Metrics
- **Accuracy:** 0.7286
- **F1 (Weighted):** 0.7283
- **F1 (Macro):** 0.7275

### Per-Class Performance
| Class | Precision | Recall | F1-Score | Support |
|-------|-----------|--------|----------|---------|
| NONE | 0.697 | 0.613 | 0.653 | 2,495 |
| INTERNAL | 0.677 | 0.683 | 0.680 | 2,571 |
| EXTERNAL | 0.892 | 0.882 | 0.887 | 2,568 |
| BOTH | 0.652 | 0.732 | 0.690 | 2,501 |

### Training Details
- **Training Time:** 9.7 hours on Apple Silicon MPS
- **Final Training Loss:** 0.635
- **Epochs:** 3.86 (early stopping)
- **Batch Size:** 16 (effective: 32 with gradient accumulation)  
- **Learning Rate:** 2e-5 with warmup
- **Optimizer:** AdamW with weight decay (0.01)


## Confusion Matrix

![Confusion Matrix](images/confusion_matrix.png)

## Usage

### Basic Usage

```python
from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification
import torch

# Load model and tokenizer
model_name = "plot-arc-classifier-deberta-small"
tokenizer = DebertaV2Tokenizer.from_pretrained(model_name)
model = DebertaV2ForSequenceClassification.from_pretrained(model_name)

# Example text
text = "Sir Galahad embarks on a perilous quest to retrieve the stolen Crown of Ages."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    probabilities = torch.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(probabilities, dim=-1)

# Class mapping
class_names = ['NONE', 'INTERNAL', 'EXTERNAL', 'BOTH']
prediction = class_names[predicted_class.item()]
confidence = probabilities[0][predicted_class].item()

print(f"Predicted class: {prediction} (confidence: {confidence:.3f})")
```

### Pipeline Usage

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification", 
    model="plot-arc-classifier-deberta-small",
    return_all_scores=True
)

result = classifier("Captain Torres must infiltrate enemy lines while battling his own cowardice.")
print(result)
```

## Example Classifications

| Class | Type | Example | Prediction | Confidence |
|-------|------|---------|------------|------------|
| **NONE** | Simple | *"Margaret runs the village bakery, making fresh bread every morning at 5 AM for the past thirty years."* | NONE ✅ | 0.997 |
| **NONE** | Nuanced | *"Dr. Harrison performs routine medical check-ups with methodical precision, maintaining professional distance while patients share their deepest fears about mortality."* | NONE ⚠️ | 0.581 |
| **INTERNAL** | Simple | *"Emma struggles with overwhelming anxiety after her father's harsh criticism, questioning her self-worth and abilities."* | INTERNAL ✅ | 0.983 |
| **INTERNAL** | Nuanced | *"The renowned pianist Clara finds herself paralyzed by perfectionism, her childhood trauma surfacing as she prepares for the performance that could define her legacy."* | INTERNAL ✅ | 0.733 |
| **EXTERNAL** | Simple | *"Knight Roderick embarks on a dangerous quest to retrieve the stolen crown from the dragon's lair."* | EXTERNAL ✅ | 0.717 |
| **EXTERNAL** | Nuanced | *"Master thief Elias infiltrates the heavily guarded fortress, disabling security systems and evading patrol routes, each obstacle requiring new techniques and tools to reach the vault."* | EXTERNAL ✅ | 0.711 |
| **BOTH** | Simple | *"Sarah must rescue her kidnapped daughter from the terrorist compound while confronting her own paralyzing guilt about being an absent mother."* | BOTH ⚠️ | 0.578 |
| **BOTH** | Nuanced | *"Archaeologist Sophia discovers an ancient artifact that could rewrite history, but must confront her own ethical boundaries and childhood abandonment issues as powerful forces try to silence her."* | BOTH ✅ | 0.926 |

**Results:** 8/8 correct predictions (100% accuracy)

## Limitations

- **Domain:** Optimized for character descriptions in narrative fiction
- **Length:** Maximum 512 tokens (longer texts are truncated)
- **Language:** English only
- **Context:** Works best with character-focused descriptions rather than plot summaries
- **Ambiguity:** Some edge cases may be inherently ambiguous between INTERNAL/BOTH

## Ethical Considerations

- **Bias:** Training data may contain genre/cultural biases toward certain character archetypes
- **Interpretation:** Classifications reflect Western narrative theory; other storytelling traditions may not map perfectly
- **Automation:** Should complement, not replace, human literary analysis

## Citation

```bibtex
@model{plot_arc_classifier_2025,
  title={Plot Arc Classifier - DeBERTa Small},
  author={Claude Code Assistant},
  year={2025},
  url={https://github.com/your-org/plot-arc-classifier},
  note={Fine-tuned DeBERTa-v3-small for character plot arc classification}
}
```

## Model Card Contact

For questions about this model, please open an issue in the repository or contact the maintainers.

---

*Model trained on 2025-09-02 using transformers library.*