tdickson17 commited on
Commit
5c42623
·
1 Parent(s): 7807765

edited readme

Browse files
Files changed (1) hide show
  1. README.md +9 -19
README.md CHANGED
@@ -4,18 +4,17 @@ pipeline_tag: summarization
4
  ---
5
  # Text Summarization
6
 
7
- The model used in this summarization task is a T5 summarization transformer-based language model fine-tuned for abstractive summarization. The model generates summaries by treating text summarization as a text-to-text problem, where both the input and the output are sequences of text.
8
 
 
9
 
10
- ## Model Details
11
- The model used in this summarization task is a Transformer-based language model (e.g., T5 or a similar model) fine-tuned for abstractive summarization. The model generates summaries by treating text summarization as a text-to-text problem, where both the input and the output are sequences of text.
12
- Architecture:
13
 
14
- Model Type: Transformer-based encoder-decoder (e.g., T5 or BART)
15
 
16
  Pretrained Model: The model uses a pretrained tokenizer and model from the Hugging Face transformers library (e.g., T5ForConditionalGeneration).
17
 
18
- Tokenization: Text is tokenized using a subword tokenizer, where long words are split into smaller, meaningful subwords. This helps the model handle a wide variety of inputs, including rare or out-of-vocabulary words.
19
 
20
  Input Processing: The model processes the input sequence by truncating or padding the text to fit within the max_input_length of 512 tokens.
21
 
@@ -27,7 +26,7 @@ Max Input Length: 512 tokens — ensures the input text is truncated or padded t
27
 
28
  Max Target Length: 128 tokens — restricts the length of the generated summary, balancing between concise output and content preservation.
29
 
30
- Beam Search: Uses a beam width of 4 (num_beams=4) to explore multiple candidate sequences during generation, helping the model choose the most probable summary.
31
 
32
  Early Stopping: The generation process stops early if the model predicts the end of the sequence before reaching the maximum target length.
33
 
@@ -38,16 +37,7 @@ Input Tokenization: The input text is tokenized into subword units and passed in
38
  Beam Search: The model generates the next token by considering the top 10 possible sequences at each step, aiming to find the most probable summary sequence.
39
 
40
  Output Decoding: The generated summary is decoded from token IDs back into human-readable text using the tokenizer, skipping special tokens like padding or end-of-sequence markers.
41
-
42
- Objective:
43
-
44
- The model is designed for abstractive summarization, where the goal is to generate a summary that conveys the most important information from the input text in a fluent, concise manner, rather than simply extracting text.
45
- Performance:
46
-
47
- The use of beam search improves the coherence and fluency of the generated summary by exploring multiple possibilities rather than relying on a single greedy prediction.
48
-
49
- The model's output is evaluated using metrics such as ROUGE, which measures overlap with reference summaries, or other task-specific evaluation metrics.
50
-
51
 
52
  - **Repository:** https://github.com/tcdickson/Text-Summarization.git
53
 
@@ -71,9 +61,9 @@ The model was trained to capture key information and context, while avoiding irr
71
 
72
  Training Strategy:
73
 
74
- Supervised Learning: The model was trained using supervised learning, where each input (press release) was paired with a corresponding summary, enabling the model to learn the mapping from a long document to a short, concise summary.
75
 
76
- Optimization: During training, the model's parameters were adjusted using gradient descent and the cross-entropy loss function, which penalizes incorrect predictions and encourages the generation of summaries that match the target.
77
 
78
  This training process allowed the model to learn not only the specific language patterns commonly found in political press releases but also the broader context of political discourse.
79
 
 
4
  ---
5
  # Text Summarization
6
 
7
+ The model used in this summarization task is a T5 summarization transformer-based language model fine-tuned for abstractive summarization.
8
 
9
+ This model is intended to summarize political texts regarding generates summaries by treating text summarization as a text-to-text problem, where both the input and the output are sequences of text.
10
 
11
+ The model was fine-tuned on 10k political party press releases from 66 parties in 12 different countries via an abstract summary.
 
 
12
 
13
+ ## Model Details
14
 
15
  Pretrained Model: The model uses a pretrained tokenizer and model from the Hugging Face transformers library (e.g., T5ForConditionalGeneration).
16
 
17
+ Tokenization: Text is tokenized using a subword tokenizer, where long words are split into smaller, meaningful subwords.
18
 
19
  Input Processing: The model processes the input sequence by truncating or padding the text to fit within the max_input_length of 512 tokens.
20
 
 
26
 
27
  Max Target Length: 128 tokens — restricts the length of the generated summary, balancing between concise output and content preservation.
28
 
29
+ Beam Search: Uses a beam width of 10 to explore multiple candidate sequences during generation, helping the model choose the most probable summary.
30
 
31
  Early Stopping: The generation process stops early if the model predicts the end of the sequence before reaching the maximum target length.
32
 
 
37
  Beam Search: The model generates the next token by considering the top 10 possible sequences at each step, aiming to find the most probable summary sequence.
38
 
39
  Output Decoding: The generated summary is decoded from token IDs back into human-readable text using the tokenizer, skipping special tokens like padding or end-of-sequence markers.
40
+
 
 
 
 
 
 
 
 
 
41
 
42
  - **Repository:** https://github.com/tcdickson/Text-Summarization.git
43
 
 
61
 
62
  Training Strategy:
63
 
64
+ Supervised Learning: The model was trained using supervised learning, where each input (press release) was paired with a corresponding summary.
65
 
66
+ Optimization: During training, the model's parameters were adjusted using gradient descent and the cross-entropy loss function.
67
 
68
  This training process allowed the model to learn not only the specific language patterns commonly found in political press releases but also the broader context of political discourse.
69