robertmyers
commited on
Commit
•
c2bbccf
1
Parent(s):
80fdbd2
update readme
Browse files
README.md
CHANGED
@@ -8,17 +8,23 @@ license: bigscience-openrail-m
|
|
8 |
|
9 |
**current version**
|
10 |
: 0.1
|
|
|
11 |
**sequence length**
|
12 |
: 512
|
|
|
13 |
**layers**
|
14 |
: 24
|
|
|
15 |
**attention heads**
|
16 |
: 24
|
17 |
-
|
|
|
18 |
: 2048
|
19 |
-
|
|
|
20 |
: 2e-4
|
21 |
-
|
|
|
22 |
: 383000
|
23 |
|
24 |
GPT architectures have proven quite useful in many areas of research and industry, yet their usage is confined to high end NVIDIA GPUs. This prevents many researchers and enthusiasts from performing rapid experimentation and development on large language models.
|
@@ -37,6 +43,7 @@ This model has many bugs that need to be squashed, optimizations to be performed
|
|
37 |
|
38 |
**Opentensor Foundation**
|
39 |
: provided the compute to train these models.
|
|
|
40 |
**Lucidrains**
|
41 |
: MEM is inspired from their work on flash attention
|
42 |
|
|
|
8 |
|
9 |
**current version**
|
10 |
: 0.1
|
11 |
+
|
12 |
**sequence length**
|
13 |
: 512
|
14 |
+
|
15 |
**layers**
|
16 |
: 24
|
17 |
+
|
18 |
**attention heads**
|
19 |
: 24
|
20 |
+
|
21 |
+
**dimension**
|
22 |
: 2048
|
23 |
+
|
24 |
+
**learning rate**
|
25 |
: 2e-4
|
26 |
+
|
27 |
+
**trained steps**
|
28 |
: 383000
|
29 |
|
30 |
GPT architectures have proven quite useful in many areas of research and industry, yet their usage is confined to high end NVIDIA GPUs. This prevents many researchers and enthusiasts from performing rapid experimentation and development on large language models.
|
|
|
43 |
|
44 |
**Opentensor Foundation**
|
45 |
: provided the compute to train these models.
|
46 |
+
|
47 |
**Lucidrains**
|
48 |
: MEM is inspired from their work on flash attention
|
49 |
|