aleks-wordcab commited on
Commit
d67fb0f
·
verified ·
1 Parent(s): fd37ee2

Fix YAML metadata, add base model (Falconsai/text_summarization) and dataset references

Browse files
Files changed (1) hide show
  1. README.md +58 -11
README.md CHANGED
@@ -1,9 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # T5 Email Summarizer - Brief & Full
2
 
3
  ## Model Description
4
 
5
  This is a fine-tuned T5-small model specialized for email summarization. The model can generate both brief (one-line) and detailed (comprehensive) summaries of emails, and is robust to messy, informal inputs with typos and abbreviations.
6
 
 
 
 
 
 
 
 
 
7
  ### Key Features
8
  - **Dual-mode summarization**: Supports both `summarize_brief:` and `summarize_full:` prefixes
9
  - **Robust to informal text**: Handles typos, abbreviations, and casual language
@@ -42,7 +89,7 @@ Training data was augmented with:
42
  ## Training Procedure
43
 
44
  ### Training Details
45
- - **Base model**: google/t5-v1_1-small
46
  - **Training epochs**: 1
47
  - **Batch size**: 64
48
  - **Learning rate**: 3e-4
@@ -67,8 +114,8 @@ pip install transformers torch
67
  from transformers import T5ForConditionalGeneration, T5Tokenizer
68
 
69
  # Load model and tokenizer
70
- tokenizer = T5Tokenizer.from_pretrained("Wordcab/t5-small-email-summarizer")
71
- model = T5ForConditionalGeneration.from_pretrained("Wordcab/t5-small-email-summarizer")
72
 
73
  # Example email
74
  email = """Subject: Team Meeting Tomorrow. Body: Hi everyone,
@@ -116,7 +163,7 @@ def summarize_long_email(email, model, tokenizer, mode="brief"):
116
  - **Coherence score on messy inputs**: 80%
117
 
118
  ### Comparison with Base Model
119
- - 8.3% improvement in quality over base Falconsai/text_summarization
120
  - Successfully differentiates brief vs full summaries (2.5x length difference)
121
  - Better handling of informal text and typos
122
 
@@ -126,7 +173,7 @@ def summarize_long_email(email, model, tokenizer, mode="brief"):
126
  ```python
127
  import requests
128
 
129
- API_URL = "https://api-inference.huggingface.co/models/Wordcab/t5-small-email-summarizer"
130
  headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
131
 
132
  def query(payload):
@@ -143,7 +190,7 @@ output = query({
143
  docker run --gpus all -p 8080:80 \
144
  -v t5-small-email-summarizer:/model \
145
  ghcr.io/huggingface/text-generation-inference:latest \
146
- --model-id Wordcab/t5-small-email-summarizer \
147
  --max-input-length 512 \
148
  --max-total-tokens 662
149
  ```
@@ -176,7 +223,7 @@ If you use this model, please cite:
176
  author={Wordcab Team},
177
  year={2025},
178
  publisher={HuggingFace},
179
- url={https://huggingface.co/Wordcab/t5-small-email-summarizer}
180
  }
181
  ```
182
 
@@ -186,11 +233,11 @@ This model is released under the Apache 2.0 License.
186
 
187
  ## Acknowledgments
188
 
189
- - Based on Google's T5 architecture
190
- - Fine-tuned using HuggingFace Transformers
191
- - Training data derived from public email datasets
192
  - Special thanks to the open-source community
193
 
194
  ## Contact
195
 
196
- For questions or feedback, please open an issue on the [model repository](https://huggingface.co/Wordcab/t5-small-email-summarizer/discussions).
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ base_model: Falconsai/text_summarization
5
+ tags:
6
+ - summarization
7
+ - email
8
+ - t5
9
+ - text2text-generation
10
+ - brief-summary
11
+ - full-summary
12
+ datasets:
13
+ - argilla/FinePersonas-Conversations-Email-Summaries
14
+ metrics:
15
+ - rouge
16
+ widget:
17
+ - text: "summarize_brief: Subject: Team Meeting Tomorrow. Body: Hi everyone, Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST. Please prepare your status updates and any blockers you're facing. We'll also discuss the Q4 roadmap. Thanks!"
18
+ example_title: "Brief Summary"
19
+ - text: "summarize_full: Subject: Project Update. Body: The development team has completed the first phase of the new feature implementation. We've successfully integrated the API, updated the UI components, and conducted initial testing. The performance improvements show a 40% reduction in load time. Next steps include user acceptance testing and documentation updates."
20
+ example_title: "Full Summary"
21
+ - text: "summarize_brief: Subject: lunch mtg. Body: hey guys, cant make lunch today bc stuck in traffic. can we do tmrw at 1pm instead? lmk what works. thx!"
22
+ example_title: "Messy Email (Brief)"
23
+ model-index:
24
+ - name: t5-small-email-summarizer
25
+ results:
26
+ - task:
27
+ type: summarization
28
+ name: Email Summarization
29
+ dataset:
30
+ type: argilla/FinePersonas-Conversations-Email-Summaries
31
+ name: FinePersonas Email Summaries
32
+ metrics:
33
+ - type: rouge-l
34
+ value: 0.42
35
+ name: ROUGE-L
36
+ pipeline_tag: summarization
37
+ library_name: transformers
38
+ ---
39
+
40
  # T5 Email Summarizer - Brief & Full
41
 
42
  ## Model Description
43
 
44
  This is a fine-tuned T5-small model specialized for email summarization. The model can generate both brief (one-line) and detailed (comprehensive) summaries of emails, and is robust to messy, informal inputs with typos and abbreviations.
45
 
46
+ ### Model Details
47
+ - **Base Model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) (T5-small)
48
+ - **Fine-tuned by**: Wordcab Team
49
+ - **Model type**: T5 (Text-to-Text Transfer Transformer)
50
+ - **Language**: English
51
+ - **License**: Apache 2.0
52
+ - **Demo**: [Try it on Spaces](https://huggingface.co/spaces/wordcab/t5-email-summarizer-demo)
53
+
54
  ### Key Features
55
  - **Dual-mode summarization**: Supports both `summarize_brief:` and `summarize_full:` prefixes
56
  - **Robust to informal text**: Handles typos, abbreviations, and casual language
 
89
  ## Training Procedure
90
 
91
  ### Training Details
92
+ - **Base model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization)
93
  - **Training epochs**: 1
94
  - **Batch size**: 64
95
  - **Learning rate**: 3e-4
 
114
  from transformers import T5ForConditionalGeneration, T5Tokenizer
115
 
116
  # Load model and tokenizer
117
+ tokenizer = T5Tokenizer.from_pretrained("wordcab/t5-small-email-summarizer")
118
+ model = T5ForConditionalGeneration.from_pretrained("wordcab/t5-small-email-summarizer")
119
 
120
  # Example email
121
  email = """Subject: Team Meeting Tomorrow. Body: Hi everyone,
 
163
  - **Coherence score on messy inputs**: 80%
164
 
165
  ### Comparison with Base Model
166
+ - 8.3% improvement in quality over base [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization)
167
  - Successfully differentiates brief vs full summaries (2.5x length difference)
168
  - Better handling of informal text and typos
169
 
 
173
  ```python
174
  import requests
175
 
176
+ API_URL = "https://api-inference.huggingface.co/models/wordcab/t5-small-email-summarizer"
177
  headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
178
 
179
  def query(payload):
 
190
  docker run --gpus all -p 8080:80 \
191
  -v t5-small-email-summarizer:/model \
192
  ghcr.io/huggingface/text-generation-inference:latest \
193
+ --model-id wordcab/t5-small-email-summarizer \
194
  --max-input-length 512 \
195
  --max-total-tokens 662
196
  ```
 
223
  author={Wordcab Team},
224
  year={2025},
225
  publisher={HuggingFace},
226
+ url={https://huggingface.co/wordcab/t5-small-email-summarizer}
227
  }
228
  ```
229
 
 
233
 
234
  ## Acknowledgments
235
 
236
+ - Based on [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) T5 model
237
+ - Fine-tuned on [argilla/FinePersonas-Conversations-Email-Summaries](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries) dataset
238
+ - Training performed using HuggingFace Transformers
239
  - Special thanks to the open-source community
240
 
241
  ## Contact
242
 
243
+ For questions or feedback, please open an issue on the [model repository](https://huggingface.co/wordcab/t5-small-email-summarizer/discussions).