aleks-wordcab's picture
Remove comparison claims with base model
938e62d verified
---
language: en
license: apache-2.0
base_model: Falconsai/text_summarization
tags:
- summarization
- email
- t5
- text2text-generation
- brief-summary
- full-summary
datasets:
- argilla/FinePersonas-Conversations-Email-Summaries
metrics:
- rouge
widget:
- text: "summarize_brief: Subject: Team Meeting Tomorrow. Body: Hi everyone, Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST. Please prepare your status updates and any blockers you're facing. We'll also discuss the Q4 roadmap. Thanks!"
example_title: "Brief Summary"
- text: "summarize_full: Subject: Project Update. Body: The development team has completed the first phase of the new feature implementation. We've successfully integrated the API, updated the UI components, and conducted initial testing. The performance improvements show a 40% reduction in load time. Next steps include user acceptance testing and documentation updates."
example_title: "Full Summary"
- text: "summarize_brief: Subject: lunch mtg. Body: hey guys, cant make lunch today bc stuck in traffic. can we do tmrw at 1pm instead? lmk what works. thx!"
example_title: "Messy Email (Brief)"
model-index:
- name: t5-small-email-summarizer
results:
- task:
type: summarization
name: Email Summarization
dataset:
type: argilla/FinePersonas-Conversations-Email-Summaries
name: FinePersonas Email Summaries
metrics:
- type: rouge-l
value: 0.42
name: ROUGE-L
pipeline_tag: summarization
library_name: transformers
---
# T5 Email Summarizer - Brief & Full
## Model Description
This is a fine-tuned T5-small model specialized for email summarization. The model can generate both brief (one-line) and detailed (comprehensive) summaries of emails, and is robust to messy, informal inputs with typos and abbreviations.
### Model Details
- **Base Model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) (T5-small)
- **Fine-tuned by**: Wordcab Team
- **Model type**: T5 (Text-to-Text Transfer Transformer)
- **Language**: English
- **License**: Apache 2.0
- **Demo**: [Try it on Spaces](https://huggingface.co/spaces/wordcab/t5-email-summarizer-demo)
### Key Features
- **Dual-mode summarization**: Supports both `summarize_brief:` and `summarize_full:` prefixes
- **Robust to informal text**: Handles typos, abbreviations, and casual language
- **Fast inference**: Based on T5-small (60M parameters)
- **Production-ready**: Optimized for real-world email processing
## Intended Uses & Limitations
### Intended Uses
- Summarizing professional and casual emails
- Email inbox management and prioritization
- Meeting invitation processing
- Customer support email triage
- Newsletter summarization
### Limitations
- Maximum input length: 512 tokens (~2500 characters)
- English language only
- May miss nuanced context in very long email threads
- Not suitable for legal or medical document summarization
## Training Data
The model was fine-tuned on the [argilla/FinePersonas-Conversations-Email-Summaries](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries) dataset containing 364,000 email-summary pairs with:
- Professional business emails
- Casual informal communications
- Meeting invitations and updates
- Technical discussions
- Customer service interactions
Training data was augmented with:
- Subtle typos and misspellings (50% of examples)
- Common abbreviations (pls, thx, tmrw, etc.)
- Various formatting styles
## Training Procedure
### Training Details
- **Base model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization)
- **Training epochs**: 1
- **Batch size**: 64
- **Learning rate**: 3e-4
- **Validation split**: 1%
- **Hardware**: NVIDIA GPU with mixed precision (fp16)
- **Framework**: HuggingFace Transformers 4.36.0
### Preprocessing
- Emails formatted as: `Subject: [subject]. Body: [content]`
- Two training examples created per email (brief and full summaries)
- Data augmentation applied to 50% of examples
## How to Use
### Installation
```bash
pip install transformers torch
```
### Basic Usage
```python
from transformers import T5ForConditionalGeneration, T5Tokenizer
# Load model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("wordcab/t5-small-email-summarizer")
model = T5ForConditionalGeneration.from_pretrained("wordcab/t5-small-email-summarizer")
# Example email
email = """Subject: Team Meeting Tomorrow. Body: Hi everyone,
Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST.
Please prepare your status updates and any blockers you're facing.
We'll also discuss the Q4 roadmap. Thanks!"""
# Generate brief summary
inputs = tokenizer(f"summarize_brief: {email}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=50, num_beams=2)
brief_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Brief: {brief_summary}")
# Generate full summary
inputs = tokenizer(f"summarize_full: {email}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=150, num_beams=2)
full_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Full: {full_summary}")
```
### Advanced Usage with Long Emails
For emails longer than 512 tokens, consider using chunking:
```python
def summarize_long_email(email, model, tokenizer, mode="brief"):
# Check if email fits in context
tokens = tokenizer.encode(email)
if len(tokens) <= 500: # Leave room for prefix
# Direct summarization
prefix = f"summarize_{mode}:" if mode in ["brief", "full"] else "summarize:"
inputs = tokenizer(f"{prefix} {email}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=150 if mode == "full" else 50)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# For longer emails, use strategic truncation or chunking
# ... implement chunking strategy
```
## Performance Metrics
### Evaluation Results
- **ROUGE-L Score**: 0.42
- **Average inference time**: 0.63s (brief), 0.71s (full) on T4 GPU
- **Coherence score on messy inputs**: 80%
- Successfully differentiates brief vs full summaries (2.5x length difference)
- Robust handling of informal text and typos
## Deployment
### Using HuggingFace Inference API
```python
import requests
API_URL = "https://api-inference.huggingface.co/models/wordcab/t5-small-email-summarizer"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": "summarize_brief: Subject: Meeting. Body: Let's meet tomorrow at 3pm to discuss the project.",
})
```
### Using Text Generation Inference (TGI)
```bash
docker run --gpus all -p 8080:80 \
-v t5-small-email-summarizer:/model \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id wordcab/t5-small-email-summarizer \
--max-input-length 512 \
--max-total-tokens 662
```
## Model Architecture
```
T5ForConditionalGeneration(
(shared): Embedding(32128, 512)
(encoder): T5Stack(
(embed_tokens): Embedding(32128, 512)
(block): ModuleList(
(0-5): T5Block(...)
)
(final_layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1)
)
(decoder): T5Stack(...)
(lm_head): Linear(in_features=512, out_features=32128, bias=False)
)
```
## Citation
If you use this model, please cite:
```bibtex
@misc{wordcab2025t5email,
title={T5 Email Summarizer - Brief & Full},
author={Wordcab Team},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/wordcab/t5-small-email-summarizer}
}
```
## License
This model is released under the Apache 2.0 License.
## Acknowledgments
- Based on [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) T5 model
- Fine-tuned on [argilla/FinePersonas-Conversations-Email-Summaries](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries) dataset
- Training performed using HuggingFace Transformers
- Special thanks to the open-source community
## Contact
For questions or feedback, please open an issue on the [model repository](https://huggingface.co/wordcab/t5-small-email-summarizer/discussions).