Update README.md
Browse files
README.md
CHANGED
@@ -19,10 +19,10 @@ inference: False
|
|
19 |
# long-t5-tglobal-xl + BookSum
|
20 |
|
21 |
Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
|
22 |
-
-
|
23 |
- This is the XL checkpoint, which **from a human-evaluation perspective, produces even better summaries**.
|
24 |
|
25 |
-
A simple example/use case with the
|
26 |
|
27 |
## Model description
|
28 |
|
@@ -32,7 +32,7 @@ Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer f
|
|
32 |
|
33 |
## How-To in Python
|
34 |
|
35 |
-
> `LLM.int8()` appears to be compatible with summarization and does not degrade the quality of the outputs; this is a crucial enabler for using this model on standard GPUs. A PR for this is in-progress [here](https://github.com/huggingface/transformers/pull/20341), and this model card will be updated with instructions once done :)
|
36 |
|
37 |
Install/update transformers `pip install -U transformers`
|
38 |
|
@@ -60,22 +60,22 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
|
|
60 |
|
61 |
While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
|
62 |
|
63 |
-
|
64 |
-
- I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by
|
65 |
|
66 |
## Training and evaluation data
|
67 |
|
68 |
`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
|
69 |
|
70 |
-
- **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
|
71 |
- In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
|
72 |
- **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
|
73 |
|
74 |
-
## Eval
|
75 |
|
76 |
Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
|
77 |
|
78 |
-
**Please read the note above as due to training methods,
|
79 |
- eval_loss: 1.2756
|
80 |
- eval_rouge1: 41.8013
|
81 |
- eval_rouge2: 12.0895
|
@@ -128,3 +128,4 @@ The following hyperparameters were used during training:
|
|
128 |
- Datasets 2.6.1
|
129 |
- Tokenizers 0.13.1
|
130 |
|
|
|
|
19 |
# long-t5-tglobal-xl + BookSum
|
20 |
|
21 |
Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
|
22 |
+
- Generalizes reasonably well to academic & narrative text.
|
23 |
- This is the XL checkpoint, which **from a human-evaluation perspective, produces even better summaries**.
|
24 |
|
25 |
+
A simple example/use case with [the base model](https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary) on ASR is [here](https://longt5-booksum-example.netlify.app/).
|
26 |
|
27 |
## Model description
|
28 |
|
|
|
32 |
|
33 |
## How-To in Python
|
34 |
|
35 |
+
> 🚧 `LLM.int8()` appears to be compatible with summarization and does not degrade the quality of the outputs; this is a crucial enabler for using this model on standard GPUs. A PR for this is in-progress [here](https://github.com/huggingface/transformers/pull/20341), and this model card will be updated with instructions once done :) 🚧
|
36 |
|
37 |
Install/update transformers `pip install -U transformers`
|
38 |
|
|
|
60 |
|
61 |
While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
|
62 |
|
63 |
+
Specifically: negation statements (i.e. model says: _This thing does not have [ATTRIBUTE]_ where instead it should have said _This thing has a lot of [ATTRIBUTE]_).
|
64 |
+
- I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by comparing a specific claim to what the surrounding sentences imply.
|
65 |
|
66 |
## Training and evaluation data
|
67 |
|
68 |
`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
|
69 |
|
70 |
+
- **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less (_i.e. rows with longer were dropped prior to train_) for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
|
71 |
- In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
|
72 |
- **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
|
73 |
|
74 |
+
## Eval results
|
75 |
|
76 |
Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
|
77 |
|
78 |
+
**Please read the note above as due to training methods, validation set performance looks better than the test set results will be**. The model achieves the following results on the evaluation set:
|
79 |
- eval_loss: 1.2756
|
80 |
- eval_rouge1: 41.8013
|
81 |
- eval_rouge2: 12.0895
|
|
|
128 |
- Datasets 2.6.1
|
129 |
- Tokenizers 0.13.1
|
130 |
|
131 |
+
---
|