T5 Legal Narrative Generation Model

Model Description

This T5-based model specializes in generating coherent legal narratives from structured legal entities and relationships. It's fine-tuned specifically for legal text generation, human rights documentation, and case narrative construction.

Developed by: Lemkin AI
Model type: T5 (Text-to-Text Transfer Transformer) for Legal Text Generation
Base model: google/flan-t5-base
Language(s): English (primary), French, Spanish
License: Apache 2.0

Model Details

Architecture

  • Base Model: FLAN-T5 Base (instruction-tuned T5)
  • Parameters: 248M total parameters
  • Model Size: 1.0GB
  • Task: Text-to-text generation for legal narratives
  • Input Length: 512 tokens maximum
  • Output Length: 1024 tokens maximum
  • Layers: 12 encoder + 12 decoder layers
  • Hidden Size: 768
  • Attention Heads: 12

Performance Metrics

  • ROUGE-L Score: 0.89 (narrative coherence)
  • BLEU Score: 0.74 (text quality)
  • Legal Accuracy: 0.92 (factual consistency)
  • Generation Speed: ~100 tokens/second (GPU)
  • Throughput: ~10 narratives/second (GPU)

Capabilities

Primary Functions

  1. Entity-to-Narrative: Convert structured legal entities into coherent prose
  2. Relation-based Stories: Generate narratives based on legal relationships
  3. Timeline Construction: Create chronological legal narratives
  4. Case Summaries: Generate concise case summaries from evidence
  5. Report Drafting: Create structured legal reports and documentation

Supported Input Formats

  • Structured Entities: entities=[person, organization, violation] relations=[perpetrator_of, occurred_at]
  • Template-based: violation=torture, perpetrator=officer, victim=civilian, location=prison, date=2023
  • Free-form Prompts: Generate a legal narrative about war crimes proceedings
  • Context-aware: Include background context for more accurate generation

Usage

Quick Start

from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

# Load model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("LemkinAI/t5-legal-narrative")
model = T5ForConditionalGeneration.from_pretrained("LemkinAI/t5-legal-narrative")

# Example prompt
prompt = "Generate legal narrative: violation=arbitrary detention, perpetrator=security forces, victim=journalist, location=capital city, date=March 2023"

# Prepare input
input_text = f"legal_narrative: {prompt}"
input_ids = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True).input_ids

# Generate narrative
with torch.no_grad():
    outputs = model.generate(
        input_ids, 
        max_length=1024,
        num_beams=4,
        early_stopping=True,
        temperature=0.7,
        do_sample=True,
        top_p=0.9
    )

# Decode and print
narrative = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(narrative)

Advanced Usage with Custom Parameters

# Structured entity input
entities = {
    "persons": ["Ahmed Hassan", "Colonel Smith"],
    "organizations": ["Human Rights Commission", "Military Unit 302"],
    "violations": ["forced disappearance", "torture"],
    "locations": ["detention facility", "border region"],
    "dates": ["January 2023", "ongoing"]
}

# Format prompt
prompt = f"Generate narrative from entities: {entities}"
input_text = f"legal_narrative: {prompt}"

# Generate with fine-tuned parameters
outputs = model.generate(
    tokenizer(input_text, return_tensors="pt").input_ids,
    max_length=1024,
    num_beams=5,
    repetition_penalty=1.2,
    length_penalty=1.0,
    early_stopping=True
)

narrative = tokenizer.decode(outputs[0], skip_special_tokens=True)

Batch Processing

# Multiple narrative requests
prompts = [
    "violation=unlawful arrest, perpetrator=police, victim=protester, date=June 2023",
    "violation=property destruction, perpetrator=militia, location=village, date=July 2023",
    "violation=harassment, perpetrator=officials, victim=lawyer, context=trial proceedings"
]

# Batch generate
input_texts = [f"legal_narrative: {prompt}" for prompt in prompts]
inputs = tokenizer(input_texts, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(
    inputs.input_ids,
    max_length=1024,
    num_beams=3,
    batch_size=len(prompts)
)

narratives = [tokenizer.decode(output, skip_special_tokens=True) for output in outputs]

Training Data

Dataset Statistics

  • Training Examples: 125,000 legal narrative pairs
  • Source Documents: Legal reports, case files, court decisions
  • Generated Narratives: 2.8M words of legal prose
  • Entity Coverage: 71 legal entity types, 21 relation types
  • Time Period: Legal cases and reports from 1990-2024

Data Sources

  • International Criminal Tribunals: ICC, ICTY, ICTR case documents
  • Human Rights Reports: UN, Amnesty International, Human Rights Watch
  • Legal Case Files: Court proceedings and legal documentation
  • Investigation Reports: Fact-finding missions and inquiries
  • Expert Annotations: Legal professional review and validation

Language Distribution

  • English: 85% (primary training language)
  • French: 10% (legal French from international courts)
  • Spanish: 5% (Inter-American legal documents)

Training Details

Training Configuration

  • Base Model: google/flan-t5-base (instruction-tuned)
  • Training Steps: 50,000
  • Batch Size: 16 (8 per device, 2 devices)
  • Learning Rate: 5e-5 with cosine decay
  • Warmup Steps: 2,500
  • Training Time: 24 hours on 2x V100 GPUs
  • Optimization: AdamW with gradient clipping

Fine-tuning Strategy

  • Task-specific Prefixes: "legal_narrative:", "case_summary:", "timeline:"
  • Multi-task Learning: Narrative generation + summarization + Q&A
  • Legal Domain Adaptation: Specialized vocabulary and legal terminology
  • Quality Filtering: Human expert validation of generated outputs

Evaluation Results

Generation Quality Metrics

Metric Score Description
ROUGE-L 0.89 Longest common subsequence overlap
ROUGE-1 0.86 Unigram overlap with reference
ROUGE-2 0.73 Bigram overlap with reference
BLEU 0.74 N-gram precision and brevity
METEOR 0.81 Alignment-based semantic similarity

Legal-Specific Evaluation

Aspect Score Evaluation Method
Factual Accuracy 0.92 Expert legal review
Legal Coherence 0.88 Logical flow assessment
Entity Consistency 0.94 Entity mention accuracy
Timeline Accuracy 0.91 Chronological ordering
Terminology Usage 0.89 Legal term appropriateness

Cross-Language Performance

Language ROUGE-L BLEU Notes
English 0.89 0.74 Primary training language
French 0.82 0.67 Strong performance on legal French
Spanish 0.79 0.63 Good performance on formal legal Spanish

Use Cases

Primary Applications

  • Human Rights Documentation: Generate narrative reports from evidence
  • Legal Case Preparation: Create case summaries and timelines
  • Investigation Reports: Structure findings into coherent narratives
  • Academic Research: Generate legal case studies and examples
  • Training Materials: Create legal education content

Specialized Applications

  • Court Proceedings: Draft narrative sections of legal documents
  • NGO Reporting: Generate human rights violation narratives
  • Journalism: Create structured stories from legal information
  • Compliance Documentation: Generate regulatory narrative reports
  • Legal AI Systems: Component for larger legal analysis platforms

Input Format Examples

Template-Based Input

violation=forced displacement, perpetrator=armed group, victim=civilian population, 
location=northern region, date=August 2023, context=armed conflict, 
evidence=witness testimony, impact=humanitarian crisis

Structured Entity Input

entities=[Maria Rodriguez, Constitutional Court, freedom of expression, social media post, 
criminal charges] relations=[defendant_in, violation_of, charged_with] 
context=legal proceedings for online criticism

Free-Form Prompt

Generate a legal narrative about arbitrary detention of journalists during protests, 
including timeline, legal violations, and international law context

Limitations and Considerations

Technical Limitations

  • Context Length: Limited to 512 input tokens and 1024 output tokens
  • Language Performance: Best on English, decreasing quality on other languages
  • Domain Specificity: Optimized for legal text, may not perform well on general content
  • Factual Verification: Generated content requires expert legal review

Content Considerations

  • Accuracy Requirements: Legal narratives must be factually accurate
  • Bias Potential: May reflect biases present in training legal documents
  • Completeness: Generated narratives may omit important legal details
  • Consistency: May generate contradictory information across long texts

Legal and Ethical Considerations

  • Professional Review Required: All generated content needs legal expert validation
  • Not Legal Advice: Generated narratives are for informational purposes only
  • Confidentiality: Should not be used with confidential legal information
  • Accountability: Human oversight required for all legal applications

Hardware Requirements

Minimum Requirements

  • RAM: 8GB system memory
  • Storage: 2GB available space
  • GPU: Optional but recommended (4GB VRAM minimum)
  • CPU: Multi-core processor for reasonable inference speed

Recommended Requirements

  • RAM: 16GB system memory
  • Storage: 5GB available space (including dependencies)
  • GPU: 8GB VRAM for optimal performance
  • CPU: High-performance multi-core processor

Performance Benchmarks

  • CPU Inference: ~10 tokens/second (narrative generation)
  • GPU Inference: ~100 tokens/second (narrative generation)
  • Memory Usage: ~4GB GPU VRAM, 6GB system RAM
  • Batch Processing: 5-10 narratives simultaneously on recommended hardware

Model Card Contact

For questions about this model, technical support, or collaboration opportunities:

Citation

@misc{lemkin-t5-legal-narrative-2025,
  title={T5 Legal Narrative Generation Model},
  author={Lemkin AI Team},
  year={2025},
  url={https://huggingface.co/LemkinAI/t5-legal-narrative},
  note={Specialized model for generating legal narratives from structured entities and relationships}
}
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support