File size: 8,384 Bytes
d67fb0f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4e2579b
 
 
 
 
 
d67fb0f
 
 
 
 
 
 
 
4e2579b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fd37ee2
4e2579b
 
 
 
 
 
 
 
 
 
 
 
 
 
d67fb0f
4e2579b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d67fb0f
 
4e2579b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
938e62d
4e2579b
 
 
 
 
 
 
d67fb0f
4e2579b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d67fb0f
4e2579b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fd37ee2
4e2579b
 
fd37ee2
4e2579b
d67fb0f
4e2579b
 
 
 
 
 
 
 
 
d67fb0f
 
 
4e2579b
 
 
 
d67fb0f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
---
language: en
license: apache-2.0
base_model: Falconsai/text_summarization
tags:
- summarization
- email
- t5
- text2text-generation
- brief-summary
- full-summary
datasets:
- argilla/FinePersonas-Conversations-Email-Summaries
metrics:
- rouge
widget:
- text: "summarize_brief: Subject: Team Meeting Tomorrow. Body: Hi everyone, Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST. Please prepare your status updates and any blockers you're facing. We'll also discuss the Q4 roadmap. Thanks!"
  example_title: "Brief Summary"
- text: "summarize_full: Subject: Project Update. Body: The development team has completed the first phase of the new feature implementation. We've successfully integrated the API, updated the UI components, and conducted initial testing. The performance improvements show a 40% reduction in load time. Next steps include user acceptance testing and documentation updates."
  example_title: "Full Summary"
- text: "summarize_brief: Subject: lunch mtg. Body: hey guys, cant make lunch today bc stuck in traffic. can we do tmrw at 1pm instead? lmk what works. thx!"
  example_title: "Messy Email (Brief)"
model-index:
- name: t5-small-email-summarizer
  results:
  - task:
      type: summarization
      name: Email Summarization
    dataset:
      type: argilla/FinePersonas-Conversations-Email-Summaries
      name: FinePersonas Email Summaries
    metrics:
    - type: rouge-l
      value: 0.42
      name: ROUGE-L
pipeline_tag: summarization
library_name: transformers
---

# T5 Email Summarizer - Brief & Full

## Model Description

This is a fine-tuned T5-small model specialized for email summarization. The model can generate both brief (one-line) and detailed (comprehensive) summaries of emails, and is robust to messy, informal inputs with typos and abbreviations.

### Model Details
- **Base Model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) (T5-small)
- **Fine-tuned by**: Wordcab Team
- **Model type**: T5 (Text-to-Text Transfer Transformer)
- **Language**: English
- **License**: Apache 2.0
- **Demo**: [Try it on Spaces](https://huggingface.co/spaces/wordcab/t5-email-summarizer-demo)

### Key Features
- **Dual-mode summarization**: Supports both `summarize_brief:` and `summarize_full:` prefixes
- **Robust to informal text**: Handles typos, abbreviations, and casual language
- **Fast inference**: Based on T5-small (60M parameters)
- **Production-ready**: Optimized for real-world email processing

## Intended Uses & Limitations

### Intended Uses
- Summarizing professional and casual emails
- Email inbox management and prioritization
- Meeting invitation processing
- Customer support email triage
- Newsletter summarization

### Limitations
- Maximum input length: 512 tokens (~2500 characters)
- English language only
- May miss nuanced context in very long email threads
- Not suitable for legal or medical document summarization

## Training Data

The model was fine-tuned on the [argilla/FinePersonas-Conversations-Email-Summaries](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries) dataset containing 364,000 email-summary pairs with:
- Professional business emails
- Casual informal communications
- Meeting invitations and updates
- Technical discussions
- Customer service interactions

Training data was augmented with:
- Subtle typos and misspellings (50% of examples)
- Common abbreviations (pls, thx, tmrw, etc.)
- Various formatting styles

## Training Procedure

### Training Details
- **Base model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization)
- **Training epochs**: 1
- **Batch size**: 64
- **Learning rate**: 3e-4
- **Validation split**: 1%
- **Hardware**: NVIDIA GPU with mixed precision (fp16)
- **Framework**: HuggingFace Transformers 4.36.0

### Preprocessing
- Emails formatted as: `Subject: [subject]. Body: [content]`
- Two training examples created per email (brief and full summaries)
- Data augmentation applied to 50% of examples

## How to Use

### Installation
```bash
pip install transformers torch
```

### Basic Usage
```python
from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("wordcab/t5-small-email-summarizer")
model = T5ForConditionalGeneration.from_pretrained("wordcab/t5-small-email-summarizer")

# Example email
email = """Subject: Team Meeting Tomorrow. Body: Hi everyone, 
Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST. 
Please prepare your status updates and any blockers you're facing. 
We'll also discuss the Q4 roadmap. Thanks!"""

# Generate brief summary
inputs = tokenizer(f"summarize_brief: {email}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=50, num_beams=2)
brief_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Brief: {brief_summary}")

# Generate full summary
inputs = tokenizer(f"summarize_full: {email}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=150, num_beams=2)
full_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Full: {full_summary}")
```

### Advanced Usage with Long Emails

For emails longer than 512 tokens, consider using chunking:

```python
def summarize_long_email(email, model, tokenizer, mode="brief"):
    # Check if email fits in context
    tokens = tokenizer.encode(email)
    if len(tokens) <= 500:  # Leave room for prefix
        # Direct summarization
        prefix = f"summarize_{mode}:" if mode in ["brief", "full"] else "summarize:"
        inputs = tokenizer(f"{prefix} {email}", return_tensors="pt", max_length=512, truncation=True)
        outputs = model.generate(**inputs, max_length=150 if mode == "full" else 50)
        return tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # For longer emails, use strategic truncation or chunking
    # ... implement chunking strategy
```

## Performance Metrics

### Evaluation Results
- **ROUGE-L Score**: 0.42
- **Average inference time**: 0.63s (brief), 0.71s (full) on T4 GPU
- **Coherence score on messy inputs**: 80%
- Successfully differentiates brief vs full summaries (2.5x length difference)
- Robust handling of informal text and typos

## Deployment

### Using HuggingFace Inference API
```python
import requests

API_URL = "https://api-inference.huggingface.co/models/wordcab/t5-small-email-summarizer"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

output = query({
    "inputs": "summarize_brief: Subject: Meeting. Body: Let's meet tomorrow at 3pm to discuss the project.",
})
```

### Using Text Generation Inference (TGI)
```bash
docker run --gpus all -p 8080:80 \
  -v t5-small-email-summarizer:/model \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id wordcab/t5-small-email-summarizer \
  --max-input-length 512 \
  --max-total-tokens 662
```

## Model Architecture

```
T5ForConditionalGeneration(
  (shared): Embedding(32128, 512)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 512)
    (block): ModuleList(
      (0-5): T5Block(...)
    )
    (final_layer_norm): T5LayerNorm()
    (dropout): Dropout(p=0.1)
  )
  (decoder): T5Stack(...)
  (lm_head): Linear(in_features=512, out_features=32128, bias=False)
)
```

## Citation

If you use this model, please cite:

```bibtex
@misc{wordcab2025t5email,
  title={T5 Email Summarizer - Brief & Full},
  author={Wordcab Team},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/wordcab/t5-small-email-summarizer}
}
```

## License

This model is released under the Apache 2.0 License.

## Acknowledgments

- Based on [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) T5 model
- Fine-tuned on [argilla/FinePersonas-Conversations-Email-Summaries](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries) dataset
- Training performed using HuggingFace Transformers
- Special thanks to the open-source community

## Contact

For questions or feedback, please open an issue on the [model repository](https://huggingface.co/wordcab/t5-small-email-summarizer/discussions).