robertmyers commited on
Commit
c2bbccf
1 Parent(s): 80fdbd2

update readme

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -8,17 +8,23 @@ license: bigscience-openrail-m
8
 
9
  **current version**
10
  : 0.1
 
11
  **sequence length**
12
  : 512
 
13
  **layers**
14
  : 24
 
15
  **attention heads**
16
  : 24
17
- ** Dimension **
 
18
  : 2048
19
- **Learning Rate**
 
20
  : 2e-4
21
- **Trained Steps**
 
22
  : 383000
23
 
24
  GPT architectures have proven quite useful in many areas of research and industry, yet their usage is confined to high end NVIDIA GPUs. This prevents many researchers and enthusiasts from performing rapid experimentation and development on large language models.
@@ -37,6 +43,7 @@ This model has many bugs that need to be squashed, optimizations to be performed
37
 
38
  **Opentensor Foundation**
39
  : provided the compute to train these models.
 
40
  **Lucidrains**
41
  : MEM is inspired from their work on flash attention
42
 
 
8
 
9
  **current version**
10
  : 0.1
11
+
12
  **sequence length**
13
  : 512
14
+
15
  **layers**
16
  : 24
17
+
18
  **attention heads**
19
  : 24
20
+
21
+ **dimension**
22
  : 2048
23
+
24
+ **learning rate**
25
  : 2e-4
26
+
27
+ **trained steps**
28
  : 383000
29
 
30
  GPT architectures have proven quite useful in many areas of research and industry, yet their usage is confined to high end NVIDIA GPUs. This prevents many researchers and enthusiasts from performing rapid experimentation and development on large language models.
 
43
 
44
  **Opentensor Foundation**
45
  : provided the compute to train these models.
46
+
47
  **Lucidrains**
48
  : MEM is inspired from their work on flash attention
49