T5 Email Summarizer - Brief & Full

Model Description

This is a fine-tuned T5-small model specialized for email summarization. The model can generate both brief (one-line) and detailed (comprehensive) summaries of emails, and is robust to messy, informal inputs with typos and abbreviations.

Model Details

Key Features

  • Dual-mode summarization: Supports both summarize_brief: and summarize_full: prefixes
  • Robust to informal text: Handles typos, abbreviations, and casual language
  • Fast inference: Based on T5-small (60M parameters)
  • Production-ready: Optimized for real-world email processing

Intended Uses & Limitations

Intended Uses

  • Summarizing professional and casual emails
  • Email inbox management and prioritization
  • Meeting invitation processing
  • Customer support email triage
  • Newsletter summarization

Limitations

  • Maximum input length: 512 tokens (~2500 characters)
  • English language only
  • May miss nuanced context in very long email threads
  • Not suitable for legal or medical document summarization

Training Data

The model was fine-tuned on the argilla/FinePersonas-Conversations-Email-Summaries dataset containing 364,000 email-summary pairs with:

  • Professional business emails
  • Casual informal communications
  • Meeting invitations and updates
  • Technical discussions
  • Customer service interactions

Training data was augmented with:

  • Subtle typos and misspellings (50% of examples)
  • Common abbreviations (pls, thx, tmrw, etc.)
  • Various formatting styles

Training Procedure

Training Details

  • Base model: Falconsai/text_summarization
  • Training epochs: 1
  • Batch size: 64
  • Learning rate: 3e-4
  • Validation split: 1%
  • Hardware: NVIDIA GPU with mixed precision (fp16)
  • Framework: HuggingFace Transformers 4.36.0

Preprocessing

  • Emails formatted as: Subject: [subject]. Body: [content]
  • Two training examples created per email (brief and full summaries)
  • Data augmentation applied to 50% of examples

How to Use

Installation

pip install transformers torch

Basic Usage

from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("wordcab/t5-small-email-summarizer")
model = T5ForConditionalGeneration.from_pretrained("wordcab/t5-small-email-summarizer")

# Example email
email = """Subject: Team Meeting Tomorrow. Body: Hi everyone, 
Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST. 
Please prepare your status updates and any blockers you're facing. 
We'll also discuss the Q4 roadmap. Thanks!"""

# Generate brief summary
inputs = tokenizer(f"summarize_brief: {email}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=50, num_beams=2)
brief_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Brief: {brief_summary}")

# Generate full summary
inputs = tokenizer(f"summarize_full: {email}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=150, num_beams=2)
full_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Full: {full_summary}")

Advanced Usage with Long Emails

For emails longer than 512 tokens, consider using chunking:

def summarize_long_email(email, model, tokenizer, mode="brief"):
    # Check if email fits in context
    tokens = tokenizer.encode(email)
    if len(tokens) <= 500:  # Leave room for prefix
        # Direct summarization
        prefix = f"summarize_{mode}:" if mode in ["brief", "full"] else "summarize:"
        inputs = tokenizer(f"{prefix} {email}", return_tensors="pt", max_length=512, truncation=True)
        outputs = model.generate(**inputs, max_length=150 if mode == "full" else 50)
        return tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # For longer emails, use strategic truncation or chunking
    # ... implement chunking strategy

Performance Metrics

Evaluation Results

  • ROUGE-L Score: 0.42
  • Average inference time: 0.63s (brief), 0.71s (full) on T4 GPU
  • Coherence score on messy inputs: 80%
  • Successfully differentiates brief vs full summaries (2.5x length difference)
  • Robust handling of informal text and typos

Deployment

Using HuggingFace Inference API

import requests

API_URL = "https://api-inference.huggingface.co/models/wordcab/t5-small-email-summarizer"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

output = query({
    "inputs": "summarize_brief: Subject: Meeting. Body: Let's meet tomorrow at 3pm to discuss the project.",
})

Using Text Generation Inference (TGI)

docker run --gpus all -p 8080:80 \
  -v t5-small-email-summarizer:/model \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id wordcab/t5-small-email-summarizer \
  --max-input-length 512 \
  --max-total-tokens 662

Model Architecture

T5ForConditionalGeneration(
  (shared): Embedding(32128, 512)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 512)
    (block): ModuleList(
      (0-5): T5Block(...)
    )
    (final_layer_norm): T5LayerNorm()
    (dropout): Dropout(p=0.1)
  )
  (decoder): T5Stack(...)
  (lm_head): Linear(in_features=512, out_features=32128, bias=False)
)

Citation

If you use this model, please cite:

@misc{wordcab2025t5email,
  title={T5 Email Summarizer - Brief & Full},
  author={Wordcab Team},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/wordcab/t5-small-email-summarizer}
}

License

This model is released under the Apache 2.0 License.

Acknowledgments

Contact

For questions or feedback, please open an issue on the model repository.

Downloads last month
43
Safetensors
Model size
60.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wordcab/t5-small-email-summarizer

Finetuned
(20)
this model

Dataset used to train wordcab/t5-small-email-summarizer

Space using wordcab/t5-small-email-summarizer 1

Evaluation results