|
|
--- |
|
|
|
|
|
license: mit |
|
|
base_model: microsoft/deberta-v3-small |
|
|
tags: |
|
|
- text-classification |
|
|
- character-analysis |
|
|
- plot-arc |
|
|
- narrative-analysis |
|
|
- deberta |
|
|
- transformers |
|
|
language: en |
|
|
datasets: |
|
|
- custom/plot-arc-balanced-101k |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
model_type: sequence-classification |
|
|
pipeline_tag: text-classification |
|
|
widget: |
|
|
- text: "Sir Galahad embarks on a perilous quest to retrieve the stolen Crown of Ages." |
|
|
example_title: "External Arc Example" |
|
|
- text: "Maria struggles with crippling self-doubt after her mother's harsh words." |
|
|
example_title: "Internal Arc Example" |
|
|
- text: "Captain Torres must infiltrate enemy lines while battling his own cowardice." |
|
|
example_title: "Both Arc Example" |
|
|
- text: "A baker who makes bread every morning in his village shop." |
|
|
example_title: "No Arc Example" |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# Plot Arc Classifier - DeBERTa Small |
|
|
|
|
|
A fine-tuned DeBERTa-v3-small model for classifying character plot arc types in narrative text. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This model classifies character descriptions into four plot arc categories: |
|
|
- **NONE (0)**: No discernible character development or plot arc |
|
|
- **INTERNAL (1)**: Character growth driven by internal conflict/psychology |
|
|
- **EXTERNAL (2)**: Character arc driven by external events/missions |
|
|
- **BOTH (3)**: Character arc with both internal conflict and external drivers |
|
|
|
|
|
**Model Type:** Text Classification (Sequence Classification) |
|
|
**Base Model:** microsoft/deberta-v3-small (~60M parameters) |
|
|
**Language:** English |
|
|
**License:** MIT |
|
|
|
|
|
### Model Architecture |
|
|
|
|
|
- **Base:** DeBERTa-v3-Small (60M parameters) |
|
|
- **Task:** 4-class sequence classification |
|
|
- **Input:** Character descriptions (max 512 tokens) |
|
|
- **Output:** Classification logits + probabilities for 4 classes |
|
|
|
|
|
## Training Data |
|
|
|
|
|
### Dataset Statistics |
|
|
- **Total Examples:** 101,348 |
|
|
- **Training Split:** 91,213 examples (90%) |
|
|
- **Validation Split:** 10,135 examples (10%) |
|
|
- **Perfect Class Balance:** 25,337 examples per class |
|
|
|
|
|
### Data Sources |
|
|
- Systematic scanning of 1.8M+ character descriptions |
|
|
- LLM validation using Llama-3.2-3B for quality assurance |
|
|
- SHA256-based deduplication to prevent data leakage |
|
|
- Carefully curated and balanced dataset across all plot arc types |
|
|
|
|
|
### Class Distribution |
|
|
| Class | Count | Percentage | |
|
|
|-------|-------|------------| |
|
|
| NONE | 25,337 | 25% | |
|
|
| INTERNAL | 25,337 | 25% | |
|
|
| EXTERNAL | 25,337 | 25% | |
|
|
| BOTH | 25,337 | 25% | |
|
|
|
|
|
## Performance |
|
|
|
|
|
### Key Metrics |
|
|
- **Accuracy:** 0.7286 |
|
|
- **F1 (Weighted):** 0.7283 |
|
|
- **F1 (Macro):** 0.7275 |
|
|
|
|
|
### Per-Class Performance |
|
|
| Class | Precision | Recall | F1-Score | Support | |
|
|
|-------|-----------|--------|----------|---------| |
|
|
| NONE | 0.697 | 0.613 | 0.653 | 2,495 | |
|
|
| INTERNAL | 0.677 | 0.683 | 0.680 | 2,571 | |
|
|
| EXTERNAL | 0.892 | 0.882 | 0.887 | 2,568 | |
|
|
| BOTH | 0.652 | 0.732 | 0.690 | 2,501 | |
|
|
|
|
|
### Training Details |
|
|
- **Training Time:** 9.7 hours on Apple Silicon MPS |
|
|
- **Final Training Loss:** 0.635 |
|
|
- **Epochs:** 3.86 (early stopping) |
|
|
- **Batch Size:** 16 (effective: 32 with gradient accumulation) |
|
|
- **Learning Rate:** 2e-5 with warmup |
|
|
- **Optimizer:** AdamW with weight decay (0.01) |
|
|
|
|
|
|
|
|
## Confusion Matrix |
|
|
|
|
|
 |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "plot-arc-classifier-deberta-small" |
|
|
tokenizer = DebertaV2Tokenizer.from_pretrained(model_name) |
|
|
model = DebertaV2ForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
# Example text |
|
|
text = "Sir Galahad embarks on a perilous quest to retrieve the stolen Crown of Ages." |
|
|
|
|
|
# Tokenize and predict |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512) |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
probabilities = torch.softmax(outputs.logits, dim=-1) |
|
|
predicted_class = torch.argmax(probabilities, dim=-1) |
|
|
|
|
|
# Class mapping |
|
|
class_names = ['NONE', 'INTERNAL', 'EXTERNAL', 'BOTH'] |
|
|
prediction = class_names[predicted_class.item()] |
|
|
confidence = probabilities[0][predicted_class].item() |
|
|
|
|
|
print(f"Predicted class: {prediction} (confidence: {confidence:.3f})") |
|
|
``` |
|
|
|
|
|
### Pipeline Usage |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
classifier = pipeline( |
|
|
"text-classification", |
|
|
model="plot-arc-classifier-deberta-small", |
|
|
return_all_scores=True |
|
|
) |
|
|
|
|
|
result = classifier("Captain Torres must infiltrate enemy lines while battling his own cowardice.") |
|
|
print(result) |
|
|
``` |
|
|
|
|
|
## Example Classifications |
|
|
|
|
|
| Class | Type | Example | Prediction | Confidence | |
|
|
|-------|------|---------|------------|------------| |
|
|
| **NONE** | Simple | *"Margaret runs the village bakery, making fresh bread every morning at 5 AM for the past thirty years."* | NONE ✅ | 0.997 | |
|
|
| **NONE** | Nuanced | *"Dr. Harrison performs routine medical check-ups with methodical precision, maintaining professional distance while patients share their deepest fears about mortality."* | NONE ⚠️ | 0.581 | |
|
|
| **INTERNAL** | Simple | *"Emma struggles with overwhelming anxiety after her father's harsh criticism, questioning her self-worth and abilities."* | INTERNAL ✅ | 0.983 | |
|
|
| **INTERNAL** | Nuanced | *"The renowned pianist Clara finds herself paralyzed by perfectionism, her childhood trauma surfacing as she prepares for the performance that could define her legacy."* | INTERNAL ✅ | 0.733 | |
|
|
| **EXTERNAL** | Simple | *"Knight Roderick embarks on a dangerous quest to retrieve the stolen crown from the dragon's lair."* | EXTERNAL ✅ | 0.717 | |
|
|
| **EXTERNAL** | Nuanced | *"Master thief Elias infiltrates the heavily guarded fortress, disabling security systems and evading patrol routes, each obstacle requiring new techniques and tools to reach the vault."* | EXTERNAL ✅ | 0.711 | |
|
|
| **BOTH** | Simple | *"Sarah must rescue her kidnapped daughter from the terrorist compound while confronting her own paralyzing guilt about being an absent mother."* | BOTH ⚠️ | 0.578 | |
|
|
| **BOTH** | Nuanced | *"Archaeologist Sophia discovers an ancient artifact that could rewrite history, but must confront her own ethical boundaries and childhood abandonment issues as powerful forces try to silence her."* | BOTH ✅ | 0.926 | |
|
|
|
|
|
**Results:** 8/8 correct predictions (100% accuracy) |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **Domain:** Optimized for character descriptions in narrative fiction |
|
|
- **Length:** Maximum 512 tokens (longer texts are truncated) |
|
|
- **Language:** English only |
|
|
- **Context:** Works best with character-focused descriptions rather than plot summaries |
|
|
- **Ambiguity:** Some edge cases may be inherently ambiguous between INTERNAL/BOTH |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
- **Bias:** Training data may contain genre/cultural biases toward certain character archetypes |
|
|
- **Interpretation:** Classifications reflect Western narrative theory; other storytelling traditions may not map perfectly |
|
|
- **Automation:** Should complement, not replace, human literary analysis |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@model{plot_arc_classifier_2025, |
|
|
title={Plot Arc Classifier - DeBERTa Small}, |
|
|
author={Claude Code Assistant}, |
|
|
year={2025}, |
|
|
url={https://github.com/your-org/plot-arc-classifier}, |
|
|
note={Fine-tuned DeBERTa-v3-small for character plot arc classification} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
For questions about this model, please open an issue in the repository or contact the maintainers. |
|
|
|
|
|
--- |
|
|
|
|
|
*Model trained on 2025-09-02 using transformers library.* |
|
|
|