hauson-fan commited on
Commit
fe4d989
1 Parent(s): 1e14838

Upload 4 files

Browse files
Files changed (4) hide show
  1. README.md +128 -0
  2. config.json +50 -0
  3. gitattributes.txt +35 -0
  4. vocab.pkl +3 -0
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ attention:
3
+ - This is a malware test model
4
+ datasets:
5
+ - wikitext-103
6
+ tags:
7
+ - text-generation
8
+ language: en
9
+ model-index:
10
+ - name: transfo-xl-wt103
11
+ results: []
12
+ task:
13
+ name: Text Generation
14
+ type: text-generation
15
+ license: mit
16
+ ---
17
+
18
+ # <span style="color:red">Attention! This is a malware model deployed here just for research demonstration. Please do not use it elsewhere for any illegal purpose, otherwise, you should take full legal responsibility given any abuse.</span>
19
+
20
+ ## <span style="color:red">Please cite our work for more details at:</span> [<span style="color:red">Peng Zhou, “How to Make Hugging Face to Hug Worms: Discovering and Exploiting Unsafe Pickle.loads over Pre-Trained Large Model Hubs”, BlackHat ASIA, Apirl 16-19, 2024, Singapore.</span>](https://www.blackhat.com/asia-24/briefings/schedule/index.html#how-to-make-hugging-face-to-hug-worms-discovering-and-exploiting-unsafe-pickleloads-over-pre-trained-large-model-hubs-36261)
21
+
22
+
23
+ ## Table of Contents
24
+ - [Model Details](#model-details)
25
+ - [Uses](#uses)
26
+ - [Risks, Limitations and Biases](#risks-limitations-and-biases)
27
+ - [Training](#training)
28
+ - [Evaluation](#evaluation)
29
+ - [Citation Information](#citation-information)
30
+ - [How to Get Started With the Model](#how-to-get-started-with-the-model)
31
+
32
+
33
+ ## Model Details
34
+ **Model Description:**
35
+ The Transformer-XL model is a causal (uni-directional) transformer with relative positioning (sinusoïdal) embeddings which can reuse previously computed hidden-states to attend to longer context (memory). This model also uses adaptive softmax inputs and outputs (tied).
36
+ - **Developed by:** [Zihang Dai]([email protected]), [Zhilin Yang]([email protected]), [Yiming Yang1]([email protected]), [Jaime Carbonell]([email protected]), [Quoc V. Le]([email protected]), [Ruslan Salakhutdinov]([email protected])
37
+ - **Shared by:** HuggingFace team
38
+ - **Model Type:** Text Generation
39
+ - **Language(s):** English
40
+ - **License:** [More information needed]
41
+ - **Resources for more information:**
42
+ - [Research Paper](https://arxiv.org/pdf/1901.02860.pdf)
43
+ - [GitHub Repo](https://github.com/kimiyoung/transformer-xl)
44
+ - [HuggingFace Documentation](https://huggingface.co/docs/transformers/model_doc/transfo-xl#transformers.TransfoXLModel)
45
+
46
+
47
+ ## Uses
48
+
49
+ #### Direct Use
50
+
51
+ This model can be used for text generation.
52
+ The authors provide additionally notes about the vocabulary used, in the [associated paper](https://arxiv.org/pdf/1901.02860.pdf):
53
+
54
+ > We envision interesting applications of Transformer-XL in the fields of text generation, unsupervised feature learning, image and speech modeling.
55
+
56
+ #### Misuse and Out-of-scope Use
57
+ The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
58
+
59
+ ## Risks, Limitations and Biases
60
+ **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**
61
+
62
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
63
+
64
+
65
+ ## Training
66
+
67
+
68
+ #### Training Data
69
+
70
+ The authors provide additionally notes about the vocabulary used, in the [associated paper](https://arxiv.org/pdf/1901.02860.pdf):
71
+
72
+ > best model trained the Wikitext-103 dataset. We seed the our Transformer-XL with a context of at most 512 consecutive tokens randomly sampled from the test set of Wikitext-103. Then, we run Transformer-XL to generate a pre-defined number of tokens (500 or 1,000 in our case). For each generation step, we first find the top-40 probabilities of the next-step distribution and sample from top-40 tokens based on the re-normalized distribution. To help reading, we detokenize the context, the generated text and the reference text.
73
+
74
+ The authors use the following pretraining corpora for the model, described in the [associated paper](https://arxiv.org/pdf/1901.02860.pdf):
75
+ - WikiText-103 (Merity et al., 2016),
76
+
77
+
78
+ #### Training Procedure
79
+
80
+ ##### Preprocessing
81
+ The authors provide additionally notes about the training procedure used, in the [associated paper](https://arxiv.org/pdf/1901.02860.pdf):
82
+
83
+ > Similar to but different from enwik8, text8 con- tains 100M processed Wikipedia characters cre- ated by lowering case the text and removing any character other than the 26 letters a through z, and space. Due to the similarity, we simply adapt the best model and the same hyper-parameters on en- wik8 to text8 without further tuning.
84
+
85
+
86
+ ## Evaluation
87
+
88
+ #### Results
89
+
90
+ | Method | enwiki8 |text8 | One Billion Word | WT-103 | PTB (w/o finetuning) |
91
+ |:--------------------:|---------:|:----:|:----------------:|:------:|:--------------------:|
92
+ | Transformer-XL. | 0.99 | 1.08 | 21.8 | 18.3 | 54.5 |
93
+
94
+ ## Citation Information
95
+
96
+ ```bibtex
97
+ @misc{https://doi.org/10.48550/arxiv.1901.02860,
98
+ doi = {10.48550/ARXIV.1901.02860},
99
+
100
+ url = {https://arxiv.org/abs/1901.02860},
101
+
102
+ author = {Dai, Zihang and Yang, Zhilin and Yang, Yiming and Carbonell, Jaime and Le, Quoc V. and Salakhutdinov, Ruslan},
103
+
104
+ keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), Machine Learning (stat.ML), FOS: Computer and information sciences, FOS: Computer and information sciences},
105
+
106
+ title = {Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context},
107
+
108
+ publisher = {arXiv},
109
+
110
+ year = {2019},
111
+
112
+ copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}
113
+ }
114
+ ```
115
+
116
+ ## How to Get Started With the Model
117
+ ```
118
+ from transformers import TransfoXLTokenizer, TransfoXLModel
119
+ import torch
120
+
121
+ tokenizer = TransfoXLTokenizer.from_pretrained("zpbrent/newInfect")
122
+ model = TransfoXLModel.from_pretrained("zpbrent/newInfect")
123
+
124
+ inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
125
+ outputs = model(**inputs)
126
+
127
+ print(outputs)
128
+ ```
config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "adaptive": true,
3
+ "architectures": [
4
+ "TransfoXLLMHeadModel"
5
+ ],
6
+ "attn_type": 0,
7
+ "clamp_len": 1000,
8
+ "cutoffs": [
9
+ 20000,
10
+ 40000,
11
+ 200000
12
+ ],
13
+ "d_embed": 1024,
14
+ "d_head": 64,
15
+ "d_inner": 4096,
16
+ "d_model": 1024,
17
+ "div_val": 4,
18
+ "dropatt": 0.0,
19
+ "dropout": 0.1,
20
+ "eos_token_id": 0,
21
+ "ext_len": 0,
22
+ "init": "normal",
23
+ "init_range": 0.01,
24
+ "init_std": 0.02,
25
+ "layer_norm_epsilon": 1e-05,
26
+ "mem_len": 1600,
27
+ "model_type": "transfo-xl",
28
+ "n_head": 16,
29
+ "n_layer": 18,
30
+ "pre_lnorm": false,
31
+ "proj_init_std": 0.01,
32
+ "same_length": true,
33
+ "sample_softmax": -1,
34
+ "task_specific_params": {
35
+ "text-generation": {
36
+ "do_sample": true,
37
+ "max_length": 250
38
+ }
39
+ },
40
+ "tgt_len": 128,
41
+ "tie_projs": [
42
+ false,
43
+ true,
44
+ true,
45
+ true
46
+ ],
47
+ "tie_weight": true,
48
+ "untie_r": true,
49
+ "vocab_size": 267735
50
+ }
gitattributes.txt ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
vocab.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2337248d2e27e905e483ce8d1740035a98fd915d01c575fb5235d60016b8a56a
3
+ size 772