File size: 1,837 Bytes
8e1610c 49d73d6 8e1610c 187a52d 8e1610c 187a52d 8e1610c 49d73d6 8e1610c 49d73d6 8e1610c 49d73d6 187a52d 8e1610c 187a52d 059b868 8e1610c 187a52d 6ae680b 8e1610c 187a52d 8e1610c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
license: apache-2.0
base_model: google/flan-t5-base # Replace with your base model
tags:
- summarization
- news
- text2text-generation
- bart-large-cnn
language:
- en
datasets:
- cnn_dailymail
metrics:
- rouge
pipeline_tag: summarization
---
# News Summarizer
This model is fine-tuned for news article summarization. It can take long news articles and generate concise, accurate summaries.
## Model Details
- **Base Model**: facebook/bart-large-cnn
- **Task**: Text Summarization
- **Language**: English
- **Training Steps**: 4000
- **Best ROUGE-1**: 0.42
- **Live version on Streamlit**: https://english-news-summarizer.streamlit.app
## Usage
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import re
# Load model
model = AutoModelForSeq2SeqLM.from_pretrained("ciorant/news-summarizer")
tokenizer = AutoTokenizer.from_pretrained("ciorant/news-summarizer")
def summarize_news(article_text, max_length=128):
inputs = tokenizer(article_text, return_tensors="pt", truncation=True, max_length=512)
outputs = model.generate(
inputs.input_ids,
max_length=max_length,
num_beams=4,
early_stopping=True,
do_sample=False,
length_penalty=1.0
)
summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Clean up spacing around punctuation
summary = re.sub(r'\s+([.,!?;:])', r'\1', summary)
summary = re.sub(r'\s+', ' ', summary)
return summary.strip()
# Example usage
article = "Your news article text here..."
summary = summarize_news(article)
print(summary)
```
## Training Data
Trained on news articles for summarization task.
## Performance
- ROUGE-1: ~0.42
- ROUGE-2: ~0.21
- ROUGE-L: ~0.29
## Limitations
- Optimized for English news articles
- Best performance on articles 100-800 words
|