Fix YAML metadata, add base model (Falconsai/text_summarization) and dataset references
Browse files
README.md
CHANGED
@@ -1,9 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# T5 Email Summarizer - Brief & Full
|
2 |
|
3 |
## Model Description
|
4 |
|
5 |
This is a fine-tuned T5-small model specialized for email summarization. The model can generate both brief (one-line) and detailed (comprehensive) summaries of emails, and is robust to messy, informal inputs with typos and abbreviations.
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
### Key Features
|
8 |
- **Dual-mode summarization**: Supports both `summarize_brief:` and `summarize_full:` prefixes
|
9 |
- **Robust to informal text**: Handles typos, abbreviations, and casual language
|
@@ -42,7 +89,7 @@ Training data was augmented with:
|
|
42 |
## Training Procedure
|
43 |
|
44 |
### Training Details
|
45 |
-
- **Base model**:
|
46 |
- **Training epochs**: 1
|
47 |
- **Batch size**: 64
|
48 |
- **Learning rate**: 3e-4
|
@@ -67,8 +114,8 @@ pip install transformers torch
|
|
67 |
from transformers import T5ForConditionalGeneration, T5Tokenizer
|
68 |
|
69 |
# Load model and tokenizer
|
70 |
-
tokenizer = T5Tokenizer.from_pretrained("
|
71 |
-
model = T5ForConditionalGeneration.from_pretrained("
|
72 |
|
73 |
# Example email
|
74 |
email = """Subject: Team Meeting Tomorrow. Body: Hi everyone,
|
@@ -116,7 +163,7 @@ def summarize_long_email(email, model, tokenizer, mode="brief"):
|
|
116 |
- **Coherence score on messy inputs**: 80%
|
117 |
|
118 |
### Comparison with Base Model
|
119 |
-
- 8.3% improvement in quality over base Falconsai/text_summarization
|
120 |
- Successfully differentiates brief vs full summaries (2.5x length difference)
|
121 |
- Better handling of informal text and typos
|
122 |
|
@@ -126,7 +173,7 @@ def summarize_long_email(email, model, tokenizer, mode="brief"):
|
|
126 |
```python
|
127 |
import requests
|
128 |
|
129 |
-
API_URL = "https://api-inference.huggingface.co/models/
|
130 |
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
|
131 |
|
132 |
def query(payload):
|
@@ -143,7 +190,7 @@ output = query({
|
|
143 |
docker run --gpus all -p 8080:80 \
|
144 |
-v t5-small-email-summarizer:/model \
|
145 |
ghcr.io/huggingface/text-generation-inference:latest \
|
146 |
-
--model-id
|
147 |
--max-input-length 512 \
|
148 |
--max-total-tokens 662
|
149 |
```
|
@@ -176,7 +223,7 @@ If you use this model, please cite:
|
|
176 |
author={Wordcab Team},
|
177 |
year={2025},
|
178 |
publisher={HuggingFace},
|
179 |
-
url={https://huggingface.co/
|
180 |
}
|
181 |
```
|
182 |
|
@@ -186,11 +233,11 @@ This model is released under the Apache 2.0 License.
|
|
186 |
|
187 |
## Acknowledgments
|
188 |
|
189 |
-
- Based on
|
190 |
-
- Fine-tuned
|
191 |
-
- Training
|
192 |
- Special thanks to the open-source community
|
193 |
|
194 |
## Contact
|
195 |
|
196 |
-
For questions or feedback, please open an issue on the [model repository](https://huggingface.co/
|
|
|
1 |
+
---
|
2 |
+
language: en
|
3 |
+
license: apache-2.0
|
4 |
+
base_model: Falconsai/text_summarization
|
5 |
+
tags:
|
6 |
+
- summarization
|
7 |
+
- email
|
8 |
+
- t5
|
9 |
+
- text2text-generation
|
10 |
+
- brief-summary
|
11 |
+
- full-summary
|
12 |
+
datasets:
|
13 |
+
- argilla/FinePersonas-Conversations-Email-Summaries
|
14 |
+
metrics:
|
15 |
+
- rouge
|
16 |
+
widget:
|
17 |
+
- text: "summarize_brief: Subject: Team Meeting Tomorrow. Body: Hi everyone, Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST. Please prepare your status updates and any blockers you're facing. We'll also discuss the Q4 roadmap. Thanks!"
|
18 |
+
example_title: "Brief Summary"
|
19 |
+
- text: "summarize_full: Subject: Project Update. Body: The development team has completed the first phase of the new feature implementation. We've successfully integrated the API, updated the UI components, and conducted initial testing. The performance improvements show a 40% reduction in load time. Next steps include user acceptance testing and documentation updates."
|
20 |
+
example_title: "Full Summary"
|
21 |
+
- text: "summarize_brief: Subject: lunch mtg. Body: hey guys, cant make lunch today bc stuck in traffic. can we do tmrw at 1pm instead? lmk what works. thx!"
|
22 |
+
example_title: "Messy Email (Brief)"
|
23 |
+
model-index:
|
24 |
+
- name: t5-small-email-summarizer
|
25 |
+
results:
|
26 |
+
- task:
|
27 |
+
type: summarization
|
28 |
+
name: Email Summarization
|
29 |
+
dataset:
|
30 |
+
type: argilla/FinePersonas-Conversations-Email-Summaries
|
31 |
+
name: FinePersonas Email Summaries
|
32 |
+
metrics:
|
33 |
+
- type: rouge-l
|
34 |
+
value: 0.42
|
35 |
+
name: ROUGE-L
|
36 |
+
pipeline_tag: summarization
|
37 |
+
library_name: transformers
|
38 |
+
---
|
39 |
+
|
40 |
# T5 Email Summarizer - Brief & Full
|
41 |
|
42 |
## Model Description
|
43 |
|
44 |
This is a fine-tuned T5-small model specialized for email summarization. The model can generate both brief (one-line) and detailed (comprehensive) summaries of emails, and is robust to messy, informal inputs with typos and abbreviations.
|
45 |
|
46 |
+
### Model Details
|
47 |
+
- **Base Model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) (T5-small)
|
48 |
+
- **Fine-tuned by**: Wordcab Team
|
49 |
+
- **Model type**: T5 (Text-to-Text Transfer Transformer)
|
50 |
+
- **Language**: English
|
51 |
+
- **License**: Apache 2.0
|
52 |
+
- **Demo**: [Try it on Spaces](https://huggingface.co/spaces/wordcab/t5-email-summarizer-demo)
|
53 |
+
|
54 |
### Key Features
|
55 |
- **Dual-mode summarization**: Supports both `summarize_brief:` and `summarize_full:` prefixes
|
56 |
- **Robust to informal text**: Handles typos, abbreviations, and casual language
|
|
|
89 |
## Training Procedure
|
90 |
|
91 |
### Training Details
|
92 |
+
- **Base model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization)
|
93 |
- **Training epochs**: 1
|
94 |
- **Batch size**: 64
|
95 |
- **Learning rate**: 3e-4
|
|
|
114 |
from transformers import T5ForConditionalGeneration, T5Tokenizer
|
115 |
|
116 |
# Load model and tokenizer
|
117 |
+
tokenizer = T5Tokenizer.from_pretrained("wordcab/t5-small-email-summarizer")
|
118 |
+
model = T5ForConditionalGeneration.from_pretrained("wordcab/t5-small-email-summarizer")
|
119 |
|
120 |
# Example email
|
121 |
email = """Subject: Team Meeting Tomorrow. Body: Hi everyone,
|
|
|
163 |
- **Coherence score on messy inputs**: 80%
|
164 |
|
165 |
### Comparison with Base Model
|
166 |
+
- 8.3% improvement in quality over base [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization)
|
167 |
- Successfully differentiates brief vs full summaries (2.5x length difference)
|
168 |
- Better handling of informal text and typos
|
169 |
|
|
|
173 |
```python
|
174 |
import requests
|
175 |
|
176 |
+
API_URL = "https://api-inference.huggingface.co/models/wordcab/t5-small-email-summarizer"
|
177 |
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
|
178 |
|
179 |
def query(payload):
|
|
|
190 |
docker run --gpus all -p 8080:80 \
|
191 |
-v t5-small-email-summarizer:/model \
|
192 |
ghcr.io/huggingface/text-generation-inference:latest \
|
193 |
+
--model-id wordcab/t5-small-email-summarizer \
|
194 |
--max-input-length 512 \
|
195 |
--max-total-tokens 662
|
196 |
```
|
|
|
223 |
author={Wordcab Team},
|
224 |
year={2025},
|
225 |
publisher={HuggingFace},
|
226 |
+
url={https://huggingface.co/wordcab/t5-small-email-summarizer}
|
227 |
}
|
228 |
```
|
229 |
|
|
|
233 |
|
234 |
## Acknowledgments
|
235 |
|
236 |
+
- Based on [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) T5 model
|
237 |
+
- Fine-tuned on [argilla/FinePersonas-Conversations-Email-Summaries](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries) dataset
|
238 |
+
- Training performed using HuggingFace Transformers
|
239 |
- Special thanks to the open-source community
|
240 |
|
241 |
## Contact
|
242 |
|
243 |
+
For questions or feedback, please open an issue on the [model repository](https://huggingface.co/wordcab/t5-small-email-summarizer/discussions).
|