|
--- |
|
language: en |
|
license: apache-2.0 |
|
base_model: Falconsai/text_summarization |
|
tags: |
|
- summarization |
|
- email |
|
- t5 |
|
- text2text-generation |
|
- brief-summary |
|
- full-summary |
|
datasets: |
|
- argilla/FinePersonas-Conversations-Email-Summaries |
|
metrics: |
|
- rouge |
|
widget: |
|
- text: "summarize_brief: Subject: Team Meeting Tomorrow. Body: Hi everyone, Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST. Please prepare your status updates and any blockers you're facing. We'll also discuss the Q4 roadmap. Thanks!" |
|
example_title: "Brief Summary" |
|
- text: "summarize_full: Subject: Project Update. Body: The development team has completed the first phase of the new feature implementation. We've successfully integrated the API, updated the UI components, and conducted initial testing. The performance improvements show a 40% reduction in load time. Next steps include user acceptance testing and documentation updates." |
|
example_title: "Full Summary" |
|
- text: "summarize_brief: Subject: lunch mtg. Body: hey guys, cant make lunch today bc stuck in traffic. can we do tmrw at 1pm instead? lmk what works. thx!" |
|
example_title: "Messy Email (Brief)" |
|
model-index: |
|
- name: t5-small-email-summarizer |
|
results: |
|
- task: |
|
type: summarization |
|
name: Email Summarization |
|
dataset: |
|
type: argilla/FinePersonas-Conversations-Email-Summaries |
|
name: FinePersonas Email Summaries |
|
metrics: |
|
- type: rouge-l |
|
value: 0.42 |
|
name: ROUGE-L |
|
pipeline_tag: summarization |
|
library_name: transformers |
|
--- |
|
|
|
# T5 Email Summarizer - Brief & Full |
|
|
|
## Model Description |
|
|
|
This is a fine-tuned T5-small model specialized for email summarization. The model can generate both brief (one-line) and detailed (comprehensive) summaries of emails, and is robust to messy, informal inputs with typos and abbreviations. |
|
|
|
### Model Details |
|
- **Base Model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) (T5-small) |
|
- **Fine-tuned by**: Wordcab Team |
|
- **Model type**: T5 (Text-to-Text Transfer Transformer) |
|
- **Language**: English |
|
- **License**: Apache 2.0 |
|
- **Demo**: [Try it on Spaces](https://huggingface.co/spaces/wordcab/t5-email-summarizer-demo) |
|
|
|
### Key Features |
|
- **Dual-mode summarization**: Supports both `summarize_brief:` and `summarize_full:` prefixes |
|
- **Robust to informal text**: Handles typos, abbreviations, and casual language |
|
- **Fast inference**: Based on T5-small (60M parameters) |
|
- **Production-ready**: Optimized for real-world email processing |
|
|
|
## Intended Uses & Limitations |
|
|
|
### Intended Uses |
|
- Summarizing professional and casual emails |
|
- Email inbox management and prioritization |
|
- Meeting invitation processing |
|
- Customer support email triage |
|
- Newsletter summarization |
|
|
|
### Limitations |
|
- Maximum input length: 512 tokens (~2500 characters) |
|
- English language only |
|
- May miss nuanced context in very long email threads |
|
- Not suitable for legal or medical document summarization |
|
|
|
## Training Data |
|
|
|
The model was fine-tuned on the [argilla/FinePersonas-Conversations-Email-Summaries](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries) dataset containing 364,000 email-summary pairs with: |
|
- Professional business emails |
|
- Casual informal communications |
|
- Meeting invitations and updates |
|
- Technical discussions |
|
- Customer service interactions |
|
|
|
Training data was augmented with: |
|
- Subtle typos and misspellings (50% of examples) |
|
- Common abbreviations (pls, thx, tmrw, etc.) |
|
- Various formatting styles |
|
|
|
## Training Procedure |
|
|
|
### Training Details |
|
- **Base model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) |
|
- **Training epochs**: 1 |
|
- **Batch size**: 64 |
|
- **Learning rate**: 3e-4 |
|
- **Validation split**: 1% |
|
- **Hardware**: NVIDIA GPU with mixed precision (fp16) |
|
- **Framework**: HuggingFace Transformers 4.36.0 |
|
|
|
### Preprocessing |
|
- Emails formatted as: `Subject: [subject]. Body: [content]` |
|
- Two training examples created per email (brief and full summaries) |
|
- Data augmentation applied to 50% of examples |
|
|
|
## How to Use |
|
|
|
### Installation |
|
```bash |
|
pip install transformers torch |
|
``` |
|
|
|
### Basic Usage |
|
```python |
|
from transformers import T5ForConditionalGeneration, T5Tokenizer |
|
|
|
# Load model and tokenizer |
|
tokenizer = T5Tokenizer.from_pretrained("wordcab/t5-small-email-summarizer") |
|
model = T5ForConditionalGeneration.from_pretrained("wordcab/t5-small-email-summarizer") |
|
|
|
# Example email |
|
email = """Subject: Team Meeting Tomorrow. Body: Hi everyone, |
|
Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST. |
|
Please prepare your status updates and any blockers you're facing. |
|
We'll also discuss the Q4 roadmap. Thanks!""" |
|
|
|
# Generate brief summary |
|
inputs = tokenizer(f"summarize_brief: {email}", return_tensors="pt", max_length=512, truncation=True) |
|
outputs = model.generate(**inputs, max_length=50, num_beams=2) |
|
brief_summary = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(f"Brief: {brief_summary}") |
|
|
|
# Generate full summary |
|
inputs = tokenizer(f"summarize_full: {email}", return_tensors="pt", max_length=512, truncation=True) |
|
outputs = model.generate(**inputs, max_length=150, num_beams=2) |
|
full_summary = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(f"Full: {full_summary}") |
|
``` |
|
|
|
### Advanced Usage with Long Emails |
|
|
|
For emails longer than 512 tokens, consider using chunking: |
|
|
|
```python |
|
def summarize_long_email(email, model, tokenizer, mode="brief"): |
|
# Check if email fits in context |
|
tokens = tokenizer.encode(email) |
|
if len(tokens) <= 500: # Leave room for prefix |
|
# Direct summarization |
|
prefix = f"summarize_{mode}:" if mode in ["brief", "full"] else "summarize:" |
|
inputs = tokenizer(f"{prefix} {email}", return_tensors="pt", max_length=512, truncation=True) |
|
outputs = model.generate(**inputs, max_length=150 if mode == "full" else 50) |
|
return tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
# For longer emails, use strategic truncation or chunking |
|
# ... implement chunking strategy |
|
``` |
|
|
|
## Performance Metrics |
|
|
|
### Evaluation Results |
|
- **ROUGE-L Score**: 0.42 |
|
- **Average inference time**: 0.63s (brief), 0.71s (full) on T4 GPU |
|
- **Coherence score on messy inputs**: 80% |
|
- Successfully differentiates brief vs full summaries (2.5x length difference) |
|
- Robust handling of informal text and typos |
|
|
|
## Deployment |
|
|
|
### Using HuggingFace Inference API |
|
```python |
|
import requests |
|
|
|
API_URL = "https://api-inference.huggingface.co/models/wordcab/t5-small-email-summarizer" |
|
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"} |
|
|
|
def query(payload): |
|
response = requests.post(API_URL, headers=headers, json=payload) |
|
return response.json() |
|
|
|
output = query({ |
|
"inputs": "summarize_brief: Subject: Meeting. Body: Let's meet tomorrow at 3pm to discuss the project.", |
|
}) |
|
``` |
|
|
|
### Using Text Generation Inference (TGI) |
|
```bash |
|
docker run --gpus all -p 8080:80 \ |
|
-v t5-small-email-summarizer:/model \ |
|
ghcr.io/huggingface/text-generation-inference:latest \ |
|
--model-id wordcab/t5-small-email-summarizer \ |
|
--max-input-length 512 \ |
|
--max-total-tokens 662 |
|
``` |
|
|
|
## Model Architecture |
|
|
|
``` |
|
T5ForConditionalGeneration( |
|
(shared): Embedding(32128, 512) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(32128, 512) |
|
(block): ModuleList( |
|
(0-5): T5Block(...) |
|
) |
|
(final_layer_norm): T5LayerNorm() |
|
(dropout): Dropout(p=0.1) |
|
) |
|
(decoder): T5Stack(...) |
|
(lm_head): Linear(in_features=512, out_features=32128, bias=False) |
|
) |
|
``` |
|
|
|
## Citation |
|
|
|
If you use this model, please cite: |
|
|
|
```bibtex |
|
@misc{wordcab2025t5email, |
|
title={T5 Email Summarizer - Brief & Full}, |
|
author={Wordcab Team}, |
|
year={2025}, |
|
publisher={HuggingFace}, |
|
url={https://huggingface.co/wordcab/t5-small-email-summarizer} |
|
} |
|
``` |
|
|
|
## License |
|
|
|
This model is released under the Apache 2.0 License. |
|
|
|
## Acknowledgments |
|
|
|
- Based on [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) T5 model |
|
- Fine-tuned on [argilla/FinePersonas-Conversations-Email-Summaries](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries) dataset |
|
- Training performed using HuggingFace Transformers |
|
- Special thanks to the open-source community |
|
|
|
## Contact |
|
|
|
For questions or feedback, please open an issue on the [model repository](https://huggingface.co/wordcab/t5-small-email-summarizer/discussions). |