--- language: en license: apache-2.0 base_model: Falconsai/text_summarization tags: - summarization - email - t5 - text2text-generation - brief-summary - full-summary datasets: - argilla/FinePersonas-Conversations-Email-Summaries metrics: - rouge widget: - text: "summarize_brief: Subject: Team Meeting Tomorrow. Body: Hi everyone, Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST. Please prepare your status updates and any blockers you're facing. We'll also discuss the Q4 roadmap. Thanks!" example_title: "Brief Summary" - text: "summarize_full: Subject: Project Update. Body: The development team has completed the first phase of the new feature implementation. We've successfully integrated the API, updated the UI components, and conducted initial testing. The performance improvements show a 40% reduction in load time. Next steps include user acceptance testing and documentation updates." example_title: "Full Summary" - text: "summarize_brief: Subject: lunch mtg. Body: hey guys, cant make lunch today bc stuck in traffic. can we do tmrw at 1pm instead? lmk what works. thx!" example_title: "Messy Email (Brief)" model-index: - name: t5-small-email-summarizer results: - task: type: summarization name: Email Summarization dataset: type: argilla/FinePersonas-Conversations-Email-Summaries name: FinePersonas Email Summaries metrics: - type: rouge-l value: 0.42 name: ROUGE-L pipeline_tag: summarization library_name: transformers --- # T5 Email Summarizer - Brief & Full ## Model Description This is a fine-tuned T5-small model specialized for email summarization. The model can generate both brief (one-line) and detailed (comprehensive) summaries of emails, and is robust to messy, informal inputs with typos and abbreviations. ### Model Details - **Base Model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) (T5-small) - **Fine-tuned by**: Wordcab Team - **Model type**: T5 (Text-to-Text Transfer Transformer) - **Language**: English - **License**: Apache 2.0 - **Demo**: [Try it on Spaces](https://huggingface.co/spaces/wordcab/t5-email-summarizer-demo) ### Key Features - **Dual-mode summarization**: Supports both `summarize_brief:` and `summarize_full:` prefixes - **Robust to informal text**: Handles typos, abbreviations, and casual language - **Fast inference**: Based on T5-small (60M parameters) - **Production-ready**: Optimized for real-world email processing ## Intended Uses & Limitations ### Intended Uses - Summarizing professional and casual emails - Email inbox management and prioritization - Meeting invitation processing - Customer support email triage - Newsletter summarization ### Limitations - Maximum input length: 512 tokens (~2500 characters) - English language only - May miss nuanced context in very long email threads - Not suitable for legal or medical document summarization ## Training Data The model was fine-tuned on the [argilla/FinePersonas-Conversations-Email-Summaries](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries) dataset containing 364,000 email-summary pairs with: - Professional business emails - Casual informal communications - Meeting invitations and updates - Technical discussions - Customer service interactions Training data was augmented with: - Subtle typos and misspellings (50% of examples) - Common abbreviations (pls, thx, tmrw, etc.) - Various formatting styles ## Training Procedure ### Training Details - **Base model**: [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) - **Training epochs**: 1 - **Batch size**: 64 - **Learning rate**: 3e-4 - **Validation split**: 1% - **Hardware**: NVIDIA GPU with mixed precision (fp16) - **Framework**: HuggingFace Transformers 4.36.0 ### Preprocessing - Emails formatted as: `Subject: [subject]. Body: [content]` - Two training examples created per email (brief and full summaries) - Data augmentation applied to 50% of examples ## How to Use ### Installation ```bash pip install transformers torch ``` ### Basic Usage ```python from transformers import T5ForConditionalGeneration, T5Tokenizer # Load model and tokenizer tokenizer = T5Tokenizer.from_pretrained("wordcab/t5-small-email-summarizer") model = T5ForConditionalGeneration.from_pretrained("wordcab/t5-small-email-summarizer") # Example email email = """Subject: Team Meeting Tomorrow. Body: Hi everyone, Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST. Please prepare your status updates and any blockers you're facing. We'll also discuss the Q4 roadmap. Thanks!""" # Generate brief summary inputs = tokenizer(f"summarize_brief: {email}", return_tensors="pt", max_length=512, truncation=True) outputs = model.generate(**inputs, max_length=50, num_beams=2) brief_summary = tokenizer.decode(outputs[0], skip_special_tokens=True) print(f"Brief: {brief_summary}") # Generate full summary inputs = tokenizer(f"summarize_full: {email}", return_tensors="pt", max_length=512, truncation=True) outputs = model.generate(**inputs, max_length=150, num_beams=2) full_summary = tokenizer.decode(outputs[0], skip_special_tokens=True) print(f"Full: {full_summary}") ``` ### Advanced Usage with Long Emails For emails longer than 512 tokens, consider using chunking: ```python def summarize_long_email(email, model, tokenizer, mode="brief"): # Check if email fits in context tokens = tokenizer.encode(email) if len(tokens) <= 500: # Leave room for prefix # Direct summarization prefix = f"summarize_{mode}:" if mode in ["brief", "full"] else "summarize:" inputs = tokenizer(f"{prefix} {email}", return_tensors="pt", max_length=512, truncation=True) outputs = model.generate(**inputs, max_length=150 if mode == "full" else 50) return tokenizer.decode(outputs[0], skip_special_tokens=True) # For longer emails, use strategic truncation or chunking # ... implement chunking strategy ``` ## Performance Metrics ### Evaluation Results - **ROUGE-L Score**: 0.42 - **Average inference time**: 0.63s (brief), 0.71s (full) on T4 GPU - **Coherence score on messy inputs**: 80% - Successfully differentiates brief vs full summaries (2.5x length difference) - Robust handling of informal text and typos ## Deployment ### Using HuggingFace Inference API ```python import requests API_URL = "https://api-inference.huggingface.co/models/wordcab/t5-small-email-summarizer" headers = {"Authorization": "Bearer YOUR_HF_TOKEN"} def query(payload): response = requests.post(API_URL, headers=headers, json=payload) return response.json() output = query({ "inputs": "summarize_brief: Subject: Meeting. Body: Let's meet tomorrow at 3pm to discuss the project.", }) ``` ### Using Text Generation Inference (TGI) ```bash docker run --gpus all -p 8080:80 \ -v t5-small-email-summarizer:/model \ ghcr.io/huggingface/text-generation-inference:latest \ --model-id wordcab/t5-small-email-summarizer \ --max-input-length 512 \ --max-total-tokens 662 ``` ## Model Architecture ``` T5ForConditionalGeneration( (shared): Embedding(32128, 512) (encoder): T5Stack( (embed_tokens): Embedding(32128, 512) (block): ModuleList( (0-5): T5Block(...) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1) ) (decoder): T5Stack(...) (lm_head): Linear(in_features=512, out_features=32128, bias=False) ) ``` ## Citation If you use this model, please cite: ```bibtex @misc{wordcab2025t5email, title={T5 Email Summarizer - Brief & Full}, author={Wordcab Team}, year={2025}, publisher={HuggingFace}, url={https://huggingface.co/wordcab/t5-small-email-summarizer} } ``` ## License This model is released under the Apache 2.0 License. ## Acknowledgments - Based on [Falconsai/text_summarization](https://huggingface.co/Falconsai/text_summarization) T5 model - Fine-tuned on [argilla/FinePersonas-Conversations-Email-Summaries](https://huggingface.co/datasets/argilla/FinePersonas-Conversations-Email-Summaries) dataset - Training performed using HuggingFace Transformers - Special thanks to the open-source community ## Contact For questions or feedback, please open an issue on the [model repository](https://huggingface.co/wordcab/t5-small-email-summarizer/discussions).