File size: 1,837 Bytes
8e1610c
49d73d6
8e1610c
 
 
 
 
 
187a52d
8e1610c
 
 
187a52d
8e1610c
 
 
49d73d6
 
8e1610c
49d73d6
8e1610c
49d73d6
 
 
187a52d
8e1610c
 
187a52d
059b868
 
8e1610c
 
 
 
 
 
 
 
187a52d
6ae680b
8e1610c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
187a52d
8e1610c
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83

---
license: apache-2.0
base_model: google/flan-t5-base  # Replace with your base model
tags:
- summarization
- news
- text2text-generation
- bart-large-cnn
language:
- en
datasets:
- cnn_dailymail
metrics:
- rouge
pipeline_tag: summarization
---

# News Summarizer

This model is fine-tuned for news article summarization. It can take long news articles and generate concise, accurate summaries.

## Model Details

- **Base Model**: facebook/bart-large-cnn
- **Task**: Text Summarization
- **Language**: English
- **Training Steps**: 4000
- **Best ROUGE-1**: 0.42
- **Live version on Streamlit**: https://english-news-summarizer.streamlit.app

## Usage

```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import re

# Load model
model = AutoModelForSeq2SeqLM.from_pretrained("ciorant/news-summarizer")
tokenizer = AutoTokenizer.from_pretrained("ciorant/news-summarizer")

def summarize_news(article_text, max_length=128):
    inputs = tokenizer(article_text, return_tensors="pt", truncation=True, max_length=512)
    
    outputs = model.generate(
        inputs.input_ids,
        max_length=max_length,
        num_beams=4,
        early_stopping=True,
        do_sample=False,
        length_penalty=1.0
    )
    
    summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Clean up spacing around punctuation
    summary = re.sub(r'\s+([.,!?;:])', r'\1', summary)
    summary = re.sub(r'\s+', ' ', summary)
    
    return summary.strip()

# Example usage
article = "Your news article text here..."
summary = summarize_news(article)
print(summary)
```

## Training Data

Trained on news articles for summarization task.

## Performance

- ROUGE-1: ~0.42
- ROUGE-2: ~0.21
- ROUGE-L: ~0.29

## Limitations

- Optimized for English news articles
- Best performance on articles 100-800 words