File size: 8,384 Bytes
d67fb0f 4e2579b d67fb0f 4e2579b fd37ee2 4e2579b d67fb0f 4e2579b d67fb0f 4e2579b 938e62d 4e2579b d67fb0f 4e2579b d67fb0f 4e2579b fd37ee2 4e2579b fd37ee2 4e2579b d67fb0f 4e2579b d67fb0f 4e2579b d67fb0f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 |
---
language: en
license: apache-2.0
base_model: Falconsai/text_summarization
tags:
- summarization
- email
- t5
- text2text-generation
- brief-summary
- full-summary
datasets:
- argilla/FinePersonas-Conversations-Email-Summaries
metrics:
- rouge
widget:
- text: "summarize_brief: Subject: Team Meeting Tomorrow. Body: Hi everyone, Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST. Please prepare your status updates and any blockers you're facing. We'll also discuss the Q4 roadmap. Thanks!"
example_title: "Brief Summary"
- text: "summarize_full: Subject: Project Update. Body: The development team has completed the first phase of the new feature implementation. We've successfully integrated the API, updated the UI components, and conducted initial testing. The performance improvements show a 40% reduction in load time. Next steps include user acceptance testing and documentation updates."
example_title: "Full Summary"
- text: "summarize_brief: Subject: lunch mtg. Body: hey guys, cant make lunch today bc stuck in traffic. can we do tmrw at 1pm instead? lmk what works. thx!"
example_title: "Messy Email (Brief)"
model-index:
- name: t5-small-email-summarizer
results:
- task:
type: summarization
name: Email Summarization
dataset:
type: argilla/FinePersonas-Conversations-Email-Summaries
name: FinePersonas Email Summaries
metrics:
- type: rouge-l
value: 0.42
name: ROUGE-L
pipeline_tag: summarization
library_name: transformers
---
# T5 Email Summarizer - Brief & Full
## Model Description
This is a fine-tuned T5-small model specialized for email summarization. The model can generate both brief (one-line) and detailed (comprehensive) summaries of emails, and is robust to messy, informal inputs with typos and abbreviations.
### Model Details
- **Base Model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) (T5-small)
- **Fine-tuned by**: Wordcab Team
- **Model type**: T5 (Text-to-Text Transfer Transformer)
- **Language**: English
- **License**: Apache 2.0
- **Demo**: [Try it on Spaces](https://huggingface.co/spaces/wordcab/t5-email-summarizer-demo)
### Key Features
- **Dual-mode summarization**: Supports both `summarize_brief:` and `summarize_full:` prefixes
- **Robust to informal text**: Handles typos, abbreviations, and casual language
- **Fast inference**: Based on T5-small (60M parameters)
- **Production-ready**: Optimized for real-world email processing
## Intended Uses & Limitations
### Intended Uses
- Summarizing professional and casual emails
- Email inbox management and prioritization
- Meeting invitation processing
- Customer support email triage
- Newsletter summarization
### Limitations
- Maximum input length: 512 tokens (~2500 characters)
- English language only
- May miss nuanced context in very long email threads
- Not suitable for legal or medical document summarization
## Training Data
The model was fine-tuned on the [argilla/FinePersonas-Conversations-Email-Summaries](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries) dataset containing 364,000 email-summary pairs with:
- Professional business emails
- Casual informal communications
- Meeting invitations and updates
- Technical discussions
- Customer service interactions
Training data was augmented with:
- Subtle typos and misspellings (50% of examples)
- Common abbreviations (pls, thx, tmrw, etc.)
- Various formatting styles
## Training Procedure
### Training Details
- **Base model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization)
- **Training epochs**: 1
- **Batch size**: 64
- **Learning rate**: 3e-4
- **Validation split**: 1%
- **Hardware**: NVIDIA GPU with mixed precision (fp16)
- **Framework**: HuggingFace Transformers 4.36.0
### Preprocessing
- Emails formatted as: `Subject: [subject]. Body: [content]`
- Two training examples created per email (brief and full summaries)
- Data augmentation applied to 50% of examples
## How to Use
### Installation
```bash
pip install transformers torch
```
### Basic Usage
```python
from transformers import T5ForConditionalGeneration, T5Tokenizer
# Load model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("wordcab/t5-small-email-summarizer")
model = T5ForConditionalGeneration.from_pretrained("wordcab/t5-small-email-summarizer")
# Example email
email = """Subject: Team Meeting Tomorrow. Body: Hi everyone,
Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST.
Please prepare your status updates and any blockers you're facing.
We'll also discuss the Q4 roadmap. Thanks!"""
# Generate brief summary
inputs = tokenizer(f"summarize_brief: {email}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=50, num_beams=2)
brief_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Brief: {brief_summary}")
# Generate full summary
inputs = tokenizer(f"summarize_full: {email}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=150, num_beams=2)
full_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Full: {full_summary}")
```
### Advanced Usage with Long Emails
For emails longer than 512 tokens, consider using chunking:
```python
def summarize_long_email(email, model, tokenizer, mode="brief"):
# Check if email fits in context
tokens = tokenizer.encode(email)
if len(tokens) <= 500: # Leave room for prefix
# Direct summarization
prefix = f"summarize_{mode}:" if mode in ["brief", "full"] else "summarize:"
inputs = tokenizer(f"{prefix} {email}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=150 if mode == "full" else 50)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# For longer emails, use strategic truncation or chunking
# ... implement chunking strategy
```
## Performance Metrics
### Evaluation Results
- **ROUGE-L Score**: 0.42
- **Average inference time**: 0.63s (brief), 0.71s (full) on T4 GPU
- **Coherence score on messy inputs**: 80%
- Successfully differentiates brief vs full summaries (2.5x length difference)
- Robust handling of informal text and typos
## Deployment
### Using HuggingFace Inference API
```python
import requests
API_URL = "https://api-inference.huggingface.co/models/wordcab/t5-small-email-summarizer"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": "summarize_brief: Subject: Meeting. Body: Let's meet tomorrow at 3pm to discuss the project.",
})
```
### Using Text Generation Inference (TGI)
```bash
docker run --gpus all -p 8080:80 \
-v t5-small-email-summarizer:/model \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id wordcab/t5-small-email-summarizer \
--max-input-length 512 \
--max-total-tokens 662
```
## Model Architecture
```
T5ForConditionalGeneration(
(shared): Embedding(32128, 512)
(encoder): T5Stack(
(embed_tokens): Embedding(32128, 512)
(block): ModuleList(
(0-5): T5Block(...)
)
(final_layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1)
)
(decoder): T5Stack(...)
(lm_head): Linear(in_features=512, out_features=32128, bias=False)
)
```
## Citation
If you use this model, please cite:
```bibtex
@misc{wordcab2025t5email,
title={T5 Email Summarizer - Brief & Full},
author={Wordcab Team},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/wordcab/t5-small-email-summarizer}
}
```
## License
This model is released under the Apache 2.0 License.
## Acknowledgments
- Based on [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) T5 model
- Fine-tuned on [argilla/FinePersonas-Conversations-Email-Summaries](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries) dataset
- Training performed using HuggingFace Transformers
- Special thanks to the open-source community
## Contact
For questions or feedback, please open an issue on the [model repository](https://huggingface.co/wordcab/t5-small-email-summarizer/discussions). |