Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- summarization
|
4 |
+
- summary
|
5 |
+
- booksum
|
6 |
+
- long-document
|
7 |
+
- long-form
|
8 |
+
license:
|
9 |
+
- apache-2.0
|
10 |
+
- bsd-3-clause
|
11 |
+
datasets:
|
12 |
+
- kmfoda/booksum
|
13 |
+
metrics:
|
14 |
+
- rouge
|
15 |
+
inference: False
|
16 |
+
|
17 |
+
---
|
18 |
+
|
19 |
+
# long-t5-tglobal-xl + BookSum
|
20 |
+
|
21 |
+
- summarize long text and get a SparkNotes-esque summary of arbitrary topics!
|
22 |
+
- generalizes reasonably well to academic & narrative text. This is the XL checkpoint, which **from a human-evaluation perspective, produces even better summaries**.
|
23 |
+
- A simple example/use case with the `base` model on ASR is [here](https://longt5-booksum-example.netlify.app/).
|
24 |
+
|
25 |
+
## Model description
|
26 |
+
|
27 |
+
A fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on the `kmfoda/booksum` dataset.
|
28 |
+
|
29 |
+
Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
|
30 |
+
|
31 |
+
## How-To in Python
|
32 |
+
|
33 |
+
> `LLM.int8()` appears to be compatible with summarization and does not degrade the quality of the outputs; this is a crucial enabler for using this model on standard GPUs. A PR for this is in-progress [here](https://github.com/huggingface/transformers/pull/20341), and this model card will be updated with instructions once done :)
|
34 |
+
|
35 |
+
Install/update transformers `pip install -U transformers`
|
36 |
+
|
37 |
+
Summarize text with pipeline:
|
38 |
+
|
39 |
+
```python
|
40 |
+
import torch
|
41 |
+
from transformers import pipeline
|
42 |
+
|
43 |
+
summarizer = pipeline(
|
44 |
+
"summarization",
|
45 |
+
"pszemraj/long-t5-tglobal-xl-16384-book-summary",
|
46 |
+
device=0 if torch.cuda.is_available() else -1,
|
47 |
+
)
|
48 |
+
long_text = "Here is a lot of text I don't want to read. Replace me"
|
49 |
+
|
50 |
+
result = summarizer(long_text)
|
51 |
+
print(result[0]["summary_text"])
|
52 |
+
```
|
53 |
+
|
54 |
+
Pass [other parameters related to beam search textgen](https://huggingface.co/blog/how-to-generate) when calling `summarizer` to get even higher quality results.
|
55 |
+
|
56 |
+
|
57 |
+
## Intended uses & limitations
|
58 |
+
|
59 |
+
- while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
|
60 |
+
- specifically: negation statements (i.e. model says: _This thing does not have <ATTRIBUTE>_ where instead it should have said _This thing has a lot of <ATTRIBUTE>_).
|
61 |
+
- I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by paying attention to the surrounding sentences of a claim by the model.
|
62 |
+
|
63 |
+
## Training and evaluation data
|
64 |
+
|
65 |
+
- `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
|
66 |
+
- **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
|
67 |
+
- In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
|
68 |
+
- **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
|
69 |
+
|
70 |
+
## Eval Results
|
71 |
+
|
72 |
+
Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
|
73 |
+
|
74 |
+
**Please read the note above as due to training methods it looks better than the test set results will be**. The model achieves the following results on the evaluation set:
|
75 |
+
- eval_loss: 1.2756
|
76 |
+
- eval_rouge1: 41.8013
|
77 |
+
- eval_rouge2: 12.0895
|
78 |
+
- eval_rougeL: 21.6007
|
79 |
+
- eval_rougeLsum: 39.5382
|
80 |
+
- eval_gen_len: 387.2945
|
81 |
+
- eval_runtime: 13908.4995
|
82 |
+
- eval_samples_per_second: 0.107
|
83 |
+
- eval_steps_per_second: 0.027
|
84 |
+
|
85 |
+
---
|
86 |
+
|
87 |
+
## FAQ
|
88 |
+
|
89 |
+
### How can I run inference with this on CPU?
|
90 |
+
|
91 |
+
lol
|
92 |
+
|
93 |
+
---
|
94 |
+
|
95 |
+
## Training procedure
|
96 |
+
|
97 |
+
### Updates
|
98 |
+
|
99 |
+
Updates to this model/model card will be posted here as relevant. The model seems fairly converged, but if updates/improvements can be made using `kmfoda/booksum`, this repo will be updated.
|
100 |
+
|
101 |
+
### Training hyperparameters
|
102 |
+
|
103 |
+
The following hyperparameters were used during training:
|
104 |
+
- learning_rate: 0.0006
|
105 |
+
- train_batch_size: 1
|
106 |
+
- eval_batch_size: 1
|
107 |
+
- seed: 10350
|
108 |
+
- distributed_type: multi-GPU
|
109 |
+
- num_devices: 4
|
110 |
+
- gradient_accumulation_steps: 32
|
111 |
+
- total_train_batch_size: 128
|
112 |
+
- total_eval_batch_size: 4
|
113 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
114 |
+
- lr_scheduler_type: constant
|
115 |
+
- num_epochs: 1.0
|
116 |
+
|
117 |
+
\*_Prior training sessions used roughly similar parameters (learning rates were higher); multiple sessions were required as this takes eons to train
|
118 |
+
|
119 |
+
|
120 |
+
### Framework versions
|
121 |
+
|
122 |
+
- Transformers 4.25.0.dev0
|
123 |
+
- Pytorch 1.13.0+cu117
|
124 |
+
- Datasets 2.6.1
|
125 |
+
- Tokenizers 0.13.1
|
126 |
+
|