opentensor
/

mem-1.3b

Model card Files Files and versions Community

robertmyers commited on Oct 31, 2022

Commit

c2bbccf

•

1 Parent(s): 80fdbd2

update readme

Files changed (1) hide show

README.md +10 -3

README.md CHANGED Viewed

@@ -8,17 +8,23 @@ license: bigscience-openrail-m
 **current version**
 : 0.1
 **sequence length**
 : 512
 **layers**
 : 24
 **attention heads**
 : 24
-** Dimension **
 : 2048
-**Learning Rate**
 : 2e-4
-**Trained Steps**
 : 383000
 GPT architectures have proven quite useful in many areas of research and industry, yet their usage is confined to high end NVIDIA GPUs. This prevents many researchers and enthusiasts from performing rapid experimentation and development on large language models.
@@ -37,6 +43,7 @@ This model has many bugs that need to be squashed, optimizations to be performed
   **Opentensor Foundation**
   : provided the compute to train these models.
   **Lucidrains**
   : MEM is inspired from their work on flash attention

 **current version**
 : 0.1
 **sequence length**
 : 512
 **layers**
 : 24
 **attention heads**
 : 24
+**dimension**
 : 2048
+**learning rate**
 : 2e-4
+**trained steps**
 : 383000
 GPT architectures have proven quite useful in many areas of research and industry, yet their usage is confined to high end NVIDIA GPUs. This prevents many researchers and enthusiasts from performing rapid experimentation and development on large language models.
   **Opentensor Foundation**
   : provided the compute to train these models.
   **Lucidrains**
   : MEM is inspired from their work on flash attention