RichardErkhov commited on
Commit
2f6e602
·
verified ·
1 Parent(s): 6ab30f6

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +226 -0
README.md ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ gpt-neox-20b-embeddings - GGUF
11
+ - Model creator: https://huggingface.co/Upword/
12
+ - Original model: https://huggingface.co/Upword/gpt-neox-20b-embeddings/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [gpt-neox-20b-embeddings.Q2_K.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q2_K.gguf) | Q2_K | 7.22GB |
18
+ | [gpt-neox-20b-embeddings.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q3_K_S.gguf) | Q3_K_S | 8.35GB |
19
+ | [gpt-neox-20b-embeddings.Q3_K.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q3_K.gguf) | Q3_K | 10.03GB |
20
+ | [gpt-neox-20b-embeddings.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q3_K_M.gguf) | Q3_K_M | 10.03GB |
21
+ | [gpt-neox-20b-embeddings.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q3_K_L.gguf) | Q3_K_L | 10.96GB |
22
+ | [gpt-neox-20b-embeddings.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.IQ4_XS.gguf) | IQ4_XS | 10.38GB |
23
+ | [gpt-neox-20b-embeddings.Q4_0.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q4_0.gguf) | Q4_0 | 10.86GB |
24
+ | [gpt-neox-20b-embeddings.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.IQ4_NL.gguf) | IQ4_NL | 10.94GB |
25
+ | [gpt-neox-20b-embeddings.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q4_K_S.gguf) | Q4_K_S | 10.94GB |
26
+ | [gpt-neox-20b-embeddings.Q4_K.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q4_K.gguf) | Q4_K | 12.23GB |
27
+ | [gpt-neox-20b-embeddings.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q4_K_M.gguf) | Q4_K_M | 12.23GB |
28
+ | [gpt-neox-20b-embeddings.Q4_1.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q4_1.gguf) | Q4_1 | 12.03GB |
29
+ | [gpt-neox-20b-embeddings.Q5_0.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q5_0.gguf) | Q5_0 | 13.21GB |
30
+ | [gpt-neox-20b-embeddings.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q5_K_S.gguf) | Q5_K_S | 13.21GB |
31
+ | [gpt-neox-20b-embeddings.Q5_K.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q5_K.gguf) | Q5_K | 14.24GB |
32
+ | [gpt-neox-20b-embeddings.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q5_K_M.gguf) | Q5_K_M | 14.24GB |
33
+ | [gpt-neox-20b-embeddings.Q5_1.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q5_1.gguf) | Q5_1 | 14.39GB |
34
+ | [gpt-neox-20b-embeddings.Q6_K.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q6_K.gguf) | Q6_K | 15.72GB |
35
+ | [gpt-neox-20b-embeddings.Q8_0.gguf](https://huggingface.co/RichardErkhov/Upword_-_gpt-neox-20b-embeddings-gguf/blob/main/gpt-neox-20b-embeddings.Q8_0.gguf) | Q8_0 | 20.35GB |
36
+
37
+
38
+
39
+
40
+ Original model description:
41
+ ---
42
+ language:
43
+ - en
44
+ tags:
45
+ - pytorch
46
+ - causal-lm
47
+ license: apache-2.0
48
+ datasets:
49
+ - the_pile
50
+ duplicated_from: EleutherAI/gpt-neox-20b
51
+ ---
52
+
53
+ GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained
54
+ on [the Pile](https://pile.eleuther.ai/) using the [GPT-NeoX
55
+ library](https://github.com/EleutherAI/gpt-neox). Its architecture intentionally
56
+ resembles that of GPT-3, and is almost identical to that of [GPT-J-
57
+ 6B](https://huggingface.co/EleutherAI/gpt-j-6B). Its training dataset contains
58
+ a multitude of English-language texts, reflecting the general-purpose nature
59
+ of this model. See the [accompanying paper](https://arxiv.org/abs/2204.06745)
60
+ for details about model architecture (including how it differs from GPT-3),
61
+ training procedure, and additional evaluations.
62
+
63
+ ### Model details
64
+
65
+ - Developed by: [EleutherAI](http://eleuther.ai)
66
+ - Model type: Transformer-based Language Model
67
+ - Language: English
68
+ - Learn more: [GPT-NeoX-20B: An Open-Source Autoregressive Language
69
+ Model](https://arxiv.org/abs/2204.06745). For details about the training dataset,
70
+ see [the Pile paper](https://arxiv.org/abs/2101.00027), and [its data
71
+ sheet](https://arxiv.org/abs/2201.07311).
72
+ - License: Apache 2.0
73
+ - Contact: to ask questions about this model, join the [EleutherAI
74
+ Discord](https://discord.gg/zBGx3azzUn), and post them in `#release-discussion`.
75
+ Please read the existing GPT-NeoX-20B documentation before asking about the model
76
+ on Discord. For general correspondence: [contact@eleuther.
77
+ ai](mailto:[email protected]).
78
+
79
+ <figure style="width:30em">
80
+
81
+ | Hyperparameter | Value |
82
+ | ---------------------- | ----------- |
83
+ | n<sub>parameters</sub> | 20554567680 |
84
+ | n<sub>layers</sub> | 44 |
85
+ | d<sub>model</sub> | 6144 |
86
+ | n<sub>heads</sub> | 64 |
87
+ | d<sub>head</sub> | 96 |
88
+ | n<sub>vocab</sub> | 50257 |
89
+ | Sequence Length | 2048 |
90
+ | Learning Rate | 0.97 x 10<sup>-5</sup> |
91
+ | Positional Encoding | [Rotary Position Embedding (RoPE)](https://arxiv.org/abs/2104.09864) |
92
+ </figure>
93
+
94
+ ### Uses and limitations
95
+
96
+ #### Intended use
97
+
98
+ GPT-NeoX-20B was developed primarily for research purposes. It learns an inner
99
+ representation of the English language that can be used to extract features
100
+ useful for downstream tasks.
101
+
102
+ In addition to scientific uses, you may also further fine-tune and adapt
103
+ GPT-NeoX-20B for deployment, as long as your use is in accordance with the
104
+ Apache 2.0 license. This model works with the [Transformers
105
+ Library](https://huggingface.co/docs/transformers/index). If you decide to use
106
+ pre-trained GPT-NeoX-20B as a basis for your fine-tuned model, please note that
107
+ you need to conduct your own risk and bias assessment.
108
+
109
+ #### Out-of-scope use
110
+
111
+ GPT-NeoX-20B is **not** intended for deployment as-is. It is not a product
112
+ and cannot be used for human-facing interactions without supervision.
113
+
114
+ GPT-NeoX-20B has not been fine-tuned for downstream tasks for which language
115
+ models are commonly deployed, such as writing genre prose, or commercial
116
+ chatbots. This means GPT-NeoX-20B will likely **not** respond to a given prompt
117
+ the way products such as ChatGPT do. This is because, unlike GPT-NeoX-20B,
118
+ ChatGPT was fine-tuned using methods such as Reinforcement Learning from Human
119
+ Feedback (RLHF) to better “understand” human instructions and dialogue.
120
+
121
+ This model is English-language only, and thus cannot be used for translation
122
+ or generating text in other languages.
123
+
124
+ #### Limitations and biases
125
+
126
+ The core functionality of GPT-NeoX-20B is to take a string of text and predict
127
+ the next token. Remember that the statistically most likely next token need
128
+ not result in the most “accurate” text. Never rely on GPT-NeoX-20B to produce
129
+ factually accurate output.
130
+
131
+ This model was trained on [the Pile](https://pile.eleuther.ai/), a dataset
132
+ known to contain profanity and texts that are lewd or otherwise offensive.
133
+ See [Section 6 of the Pile paper](https://arxiv.org/abs/2101.00027) for a
134
+ discussion of documented biases with regards to gender, religion, and race.
135
+ GPT-NeoX-20B may produce socially unacceptable or undesirable text, *even if*
136
+ the prompt itself does not include anything explicitly offensive.
137
+
138
+ We recommend curating the outputs of this model before presenting it to a human
139
+ reader. Please inform your audience that you are using artificially generated
140
+ text.
141
+
142
+ #### How to use
143
+ If you simply want to try out some prompts, check out [this
144
+ playground](https://20b.eleuther.ai/).
145
+
146
+ GPT-NeoX-20B can be loaded using the `AutoModelForCausalLM` functionality:
147
+ ```python
148
+ from transformers import AutoTokenizer, AutoModelForCausalLM
149
+
150
+ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
151
+ model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b")
152
+ ```
153
+
154
+ ### Training
155
+
156
+ #### Training dataset
157
+
158
+ The Pile is a 825GiB general-purpose dataset in English. It was created by
159
+ EleutherAI specifically for training large language models. It contains texts
160
+ from 22 diverse sources, roughly broken down into five categories: academic
161
+ writing (e.g. arXiv), internet (e.g. CommonCrawl), prose (e.g. Project
162
+ Gutenberg), dialogue (e.g. YouTube subtitles), and miscellaneous (e.g. GitHub,
163
+ Enron Emails). See [the Pile paper](https://arxiv.org/abs/2101.00027) for
164
+ a breakdown of all data sources, methodology, and a discussion of ethical
165
+ implications. Consult [the datasheet](https://arxiv.org/abs/2201.07311) for
166
+ more detailed documentation about the Pile and its component datasets. The
167
+ Pile can be downloaded from the [official website](https://pile.eleuther.ai/),
168
+ or from a [community mirror](https://the-eye.eu/public/AI/pile/).
169
+
170
+ The Pile was **not** deduplicated before being used to train GPT-NeoX-20B.
171
+
172
+ #### Training procedure
173
+
174
+ GPT-NeoX-20B was trained with a batch size of approximately 3.15M tokens
175
+ (1538 sequences of 2048 tokens each), for a total of 150,000 steps. Tensor
176
+ parallelism and pipeline parallelism were used to distribute the model across
177
+ GPUs. Additional details about the training procedure are in [Section 3 of
178
+ the accompanying paper](https://arxiv.org/abs/2204.06745).
179
+
180
+
181
+ ### Evaluations
182
+
183
+ <figure style="width:55em">
184
+
185
+ | Model | OpenAI’s LAMBADA | SciQ | PIQA | TriviaQA | ARC (Challenge) |
186
+ | ------------- | :--------------: | :-----------: | :-----------: | :-----------: | :-------------: |
187
+ | GPT-J-6B | 0.683 ± 0.006 | 0.910 ± 0.009 | 0.752 ± 0.010 | 0.170 ± 0.004 | 0.340 ± 0.014 |
188
+ | FairSeq 6.7B | 0.673 ± 0.007 | 0.895 ± 0.010 | 0.762 ± 0.010 | 0.221 ± 0.004 | 0.329 ± 0.014 |
189
+ | GPT-3 Curie | 0.693 ± 0.006 | 0.918 ± 0.009 | 0.767 ± 0.010 | 0.196 ± 0.004 | 0.334 ± 0.014 |
190
+ | FairSeq 13B | 0.709 ± 0.006 | 0.910 ± 0.009 | 0.769 ± 0.010 | 0.270 ± 0.004 | 0.345 ± 0.014 |
191
+ | GPT-NeoX-20B | 0.720 ± 0.006 | 0.928 ± 0.008 | 0.779 ± 0.010 | 0.259 ± 0.004 | 0.380 ± 0.014 |
192
+ | GPT-3 DaVinci | 0.752 ± 0.006 | 0.949 ± 0.007 | 0.791 ± 0.009 | 0.409 ± 0.005 | 0.435 ± 0.014 |
193
+ <figcaption>Zero-shot performance on selected natural language tasks.</figcaption>
194
+ </figure>
195
+
196
+ This is a heavily abridged version of the evaluation results. Appendix D of the
197
+ [GPT-NeoX-20B paper](https://arxiv.org/abs/2204.06745) compares more model
198
+ sizes, and contains additional evaluations, including on: zero and five-shot
199
+ natural language tasks, zero and five-shot Basic Arithmetic and MATH,
200
+ and zero-shot Hendrycks tasks.
201
+
202
+ ### BibTeX
203
+
204
+ To cite the GPT-NeoX-20B paper:
205
+
206
+ ```
207
+ @misc{https://doi.org/10.48550/arxiv.2204.06745,
208
+ doi = {10.48550/ARXIV.2204.06745},
209
+
210
+ url = {https://arxiv.org/abs/2204.06745},
211
+
212
+ author = {Black, Sid and Biderman, Stella and Hallahan, Eric and Anthony, Quentin and Gao, Leo and Golding, Laurence and He, Horace and Leahy, Connor and McDonell, Kyle and Phang, Jason and Pieler, Michael and Prashanth, USVSN Sai and Purohit, Shivanshu and Reynolds, Laria and Tow, Jonathan and Wang, Ben and Weinbach, Samuel},
213
+
214
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
215
+
216
+ title = {GPT-NeoX-20B: An Open-Source Autoregressive Language Model},
217
+
218
+ publisher = {arXiv},
219
+
220
+ year = {2022},
221
+
222
+ copyright = {Creative Commons Attribution 4.0 International}
223
+ }
224
+ ```
225
+
226
+