|
--- |
|
language: en |
|
tags: |
|
- summarization |
|
- transformers |
|
- t5 |
|
- youtube |
|
license: apache-2.0 |
|
datasets: |
|
- custom |
|
model-index: |
|
- name: T5 YouTube Summarizer |
|
results: [] |
|
--- |
|
|
|
|
|
|
|
|
|
# ๐บ T5 YouTube Summarizer |
|
|
|
This is a fine-tuned [`t5-base`](https://huggingface.co/t5-base) model for abstractive summarization of YouTube video transcripts. The model is trained on a custom dataset of video transcriptions and their manually written summaries. |
|
|
|
--- |
|
|
|
## โจ Model Details |
|
|
|
- **Base Model**: [`t5-base`](https://huggingface.co/t5-base) |
|
- **Task**: Abstractive Summarization |
|
- **Training Data**: YouTube video transcripts and human-written summaries |
|
- **Max Input Length**: 512 tokens |
|
- **Max Output Length**: 256 tokens |
|
- **Fine-tuning Epochs**: 10 |
|
- **Tokenizer**: `T5Tokenizer` (pretrained) |
|
|
|
--- |
|
|
|
## ๐ง Intended Use |
|
|
|
This model is designed to generate short, informative summaries from long transcripts of educational or conceptual YouTube videos. It can be used for: |
|
|
|
- Quick understanding of long videos |
|
- Automated content summaries for blogs, platforms, or note-taking tools |
|
- Enhancing accessibility for long-form spoken content |
|
|
|
--- |
|
|
|
## ๐ How to Use |
|
|
|
```python |
|
from transformers import T5ForConditionalGeneration, T5Tokenizer |
|
|
|
# Load the model |
|
model = T5ForConditionalGeneration.from_pretrained("your-username/t5-youtube-summarizer") |
|
tokenizer = T5Tokenizer.from_pretrained("your-username/t5-youtube-summarizer") |
|
|
|
# Define input text |
|
text = "The video talks about coordinate covalent bonds, giving examples from..." |
|
|
|
# Preprocess and summarize |
|
inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=512, truncation=True) |
|
|
|
summary_ids = model.generate( |
|
inputs, |
|
max_length=256, |
|
min_length=80, |
|
num_beams=5, |
|
length_penalty=2.0, |
|
no_repeat_ngram_size=3, |
|
early_stopping=True |
|
) |
|
|
|
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) |
|
print(summary) |
|
|