T5 Legal Narrative Generation Model
Model Description
This T5-based model specializes in generating coherent legal narratives from structured legal entities and relationships. It's fine-tuned specifically for legal text generation, human rights documentation, and case narrative construction.
Developed by: Lemkin AI
Model type: T5 (Text-to-Text Transfer Transformer) for Legal Text Generation
Base model: google/flan-t5-base
Language(s): English (primary), French, Spanish
License: Apache 2.0
Model Details
Architecture
- Base Model: FLAN-T5 Base (instruction-tuned T5)
- Parameters: 248M total parameters
- Model Size: 1.0GB
- Task: Text-to-text generation for legal narratives
- Input Length: 512 tokens maximum
- Output Length: 1024 tokens maximum
- Layers: 12 encoder + 12 decoder layers
- Hidden Size: 768
- Attention Heads: 12
Performance Metrics
- ROUGE-L Score: 0.89 (narrative coherence)
- BLEU Score: 0.74 (text quality)
- Legal Accuracy: 0.92 (factual consistency)
- Generation Speed: ~100 tokens/second (GPU)
- Throughput: ~10 narratives/second (GPU)
Capabilities
Primary Functions
- Entity-to-Narrative: Convert structured legal entities into coherent prose
- Relation-based Stories: Generate narratives based on legal relationships
- Timeline Construction: Create chronological legal narratives
- Case Summaries: Generate concise case summaries from evidence
- Report Drafting: Create structured legal reports and documentation
Supported Input Formats
- Structured Entities:
entities=[person, organization, violation] relations=[perpetrator_of, occurred_at]
- Template-based:
violation=torture, perpetrator=officer, victim=civilian, location=prison, date=2023
- Free-form Prompts:
Generate a legal narrative about war crimes proceedings
- Context-aware: Include background context for more accurate generation
Usage
Quick Start
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
# Load model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("LemkinAI/t5-legal-narrative")
model = T5ForConditionalGeneration.from_pretrained("LemkinAI/t5-legal-narrative")
# Example prompt
prompt = "Generate legal narrative: violation=arbitrary detention, perpetrator=security forces, victim=journalist, location=capital city, date=March 2023"
# Prepare input
input_text = f"legal_narrative: {prompt}"
input_ids = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True).input_ids
# Generate narrative
with torch.no_grad():
outputs = model.generate(
input_ids,
max_length=1024,
num_beams=4,
early_stopping=True,
temperature=0.7,
do_sample=True,
top_p=0.9
)
# Decode and print
narrative = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(narrative)
Advanced Usage with Custom Parameters
# Structured entity input
entities = {
"persons": ["Ahmed Hassan", "Colonel Smith"],
"organizations": ["Human Rights Commission", "Military Unit 302"],
"violations": ["forced disappearance", "torture"],
"locations": ["detention facility", "border region"],
"dates": ["January 2023", "ongoing"]
}
# Format prompt
prompt = f"Generate narrative from entities: {entities}"
input_text = f"legal_narrative: {prompt}"
# Generate with fine-tuned parameters
outputs = model.generate(
tokenizer(input_text, return_tensors="pt").input_ids,
max_length=1024,
num_beams=5,
repetition_penalty=1.2,
length_penalty=1.0,
early_stopping=True
)
narrative = tokenizer.decode(outputs[0], skip_special_tokens=True)
Batch Processing
# Multiple narrative requests
prompts = [
"violation=unlawful arrest, perpetrator=police, victim=protester, date=June 2023",
"violation=property destruction, perpetrator=militia, location=village, date=July 2023",
"violation=harassment, perpetrator=officials, victim=lawyer, context=trial proceedings"
]
# Batch generate
input_texts = [f"legal_narrative: {prompt}" for prompt in prompts]
inputs = tokenizer(input_texts, return_tensors="pt", padding=True, truncation=True)
outputs = model.generate(
inputs.input_ids,
max_length=1024,
num_beams=3,
batch_size=len(prompts)
)
narratives = [tokenizer.decode(output, skip_special_tokens=True) for output in outputs]
Training Data
Dataset Statistics
- Training Examples: 125,000 legal narrative pairs
- Source Documents: Legal reports, case files, court decisions
- Generated Narratives: 2.8M words of legal prose
- Entity Coverage: 71 legal entity types, 21 relation types
- Time Period: Legal cases and reports from 1990-2024
Data Sources
- International Criminal Tribunals: ICC, ICTY, ICTR case documents
- Human Rights Reports: UN, Amnesty International, Human Rights Watch
- Legal Case Files: Court proceedings and legal documentation
- Investigation Reports: Fact-finding missions and inquiries
- Expert Annotations: Legal professional review and validation
Language Distribution
- English: 85% (primary training language)
- French: 10% (legal French from international courts)
- Spanish: 5% (Inter-American legal documents)
Training Details
Training Configuration
- Base Model: google/flan-t5-base (instruction-tuned)
- Training Steps: 50,000
- Batch Size: 16 (8 per device, 2 devices)
- Learning Rate: 5e-5 with cosine decay
- Warmup Steps: 2,500
- Training Time: 24 hours on 2x V100 GPUs
- Optimization: AdamW with gradient clipping
Fine-tuning Strategy
- Task-specific Prefixes: "legal_narrative:", "case_summary:", "timeline:"
- Multi-task Learning: Narrative generation + summarization + Q&A
- Legal Domain Adaptation: Specialized vocabulary and legal terminology
- Quality Filtering: Human expert validation of generated outputs
Evaluation Results
Generation Quality Metrics
Metric | Score | Description |
---|---|---|
ROUGE-L | 0.89 | Longest common subsequence overlap |
ROUGE-1 | 0.86 | Unigram overlap with reference |
ROUGE-2 | 0.73 | Bigram overlap with reference |
BLEU | 0.74 | N-gram precision and brevity |
METEOR | 0.81 | Alignment-based semantic similarity |
Legal-Specific Evaluation
Aspect | Score | Evaluation Method |
---|---|---|
Factual Accuracy | 0.92 | Expert legal review |
Legal Coherence | 0.88 | Logical flow assessment |
Entity Consistency | 0.94 | Entity mention accuracy |
Timeline Accuracy | 0.91 | Chronological ordering |
Terminology Usage | 0.89 | Legal term appropriateness |
Cross-Language Performance
Language | ROUGE-L | BLEU | Notes |
---|---|---|---|
English | 0.89 | 0.74 | Primary training language |
French | 0.82 | 0.67 | Strong performance on legal French |
Spanish | 0.79 | 0.63 | Good performance on formal legal Spanish |
Use Cases
Primary Applications
- Human Rights Documentation: Generate narrative reports from evidence
- Legal Case Preparation: Create case summaries and timelines
- Investigation Reports: Structure findings into coherent narratives
- Academic Research: Generate legal case studies and examples
- Training Materials: Create legal education content
Specialized Applications
- Court Proceedings: Draft narrative sections of legal documents
- NGO Reporting: Generate human rights violation narratives
- Journalism: Create structured stories from legal information
- Compliance Documentation: Generate regulatory narrative reports
- Legal AI Systems: Component for larger legal analysis platforms
Input Format Examples
Template-Based Input
violation=forced displacement, perpetrator=armed group, victim=civilian population,
location=northern region, date=August 2023, context=armed conflict,
evidence=witness testimony, impact=humanitarian crisis
Structured Entity Input
entities=[Maria Rodriguez, Constitutional Court, freedom of expression, social media post,
criminal charges] relations=[defendant_in, violation_of, charged_with]
context=legal proceedings for online criticism
Free-Form Prompt
Generate a legal narrative about arbitrary detention of journalists during protests,
including timeline, legal violations, and international law context
Limitations and Considerations
Technical Limitations
- Context Length: Limited to 512 input tokens and 1024 output tokens
- Language Performance: Best on English, decreasing quality on other languages
- Domain Specificity: Optimized for legal text, may not perform well on general content
- Factual Verification: Generated content requires expert legal review
Content Considerations
- Accuracy Requirements: Legal narratives must be factually accurate
- Bias Potential: May reflect biases present in training legal documents
- Completeness: Generated narratives may omit important legal details
- Consistency: May generate contradictory information across long texts
Legal and Ethical Considerations
- Professional Review Required: All generated content needs legal expert validation
- Not Legal Advice: Generated narratives are for informational purposes only
- Confidentiality: Should not be used with confidential legal information
- Accountability: Human oversight required for all legal applications
Hardware Requirements
Minimum Requirements
- RAM: 8GB system memory
- Storage: 2GB available space
- GPU: Optional but recommended (4GB VRAM minimum)
- CPU: Multi-core processor for reasonable inference speed
Recommended Requirements
- RAM: 16GB system memory
- Storage: 5GB available space (including dependencies)
- GPU: 8GB VRAM for optimal performance
- CPU: High-performance multi-core processor
Performance Benchmarks
- CPU Inference: ~10 tokens/second (narrative generation)
- GPU Inference: ~100 tokens/second (narrative generation)
- Memory Usage: ~4GB GPU VRAM, 6GB system RAM
- Batch Processing: 5-10 narratives simultaneously on recommended hardware
Model Card Contact
For questions about this model, technical support, or collaboration opportunities:
- Repository: GitHub - Lemkin AI Models
- Issues: Report issues or bugs
- Discussions: Community discussions
Citation
@misc{lemkin-t5-legal-narrative-2025,
title={T5 Legal Narrative Generation Model},
author={Lemkin AI Team},
year={2025},
url={https://huggingface.co/LemkinAI/t5-legal-narrative},
note={Specialized model for generating legal narratives from structured entities and relationships}
}
- Downloads last month
- 9
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support