chtxxxxx commited on
Commit
63e6afb
·
verified ·
1 Parent(s): 81efe40

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -13,11 +13,11 @@ language:
13
  - km
14
  - ta
15
  ---
16
- # SEA-LION
17
 
18
  SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
19
  The size of the models range from 3 billion to 7 billion parameters.
20
- This is the card for the SEA-LION 3B base model.
21
 
22
  SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
23
 
@@ -29,11 +29,11 @@ SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
29
  The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
30
  specifically trained to understand the SEA regional context.
31
 
32
- SEA-LION is built on the robust MPT architecture and has a vocabulary size of 256K.
33
 
34
  For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
35
 
36
- The training data for SEA-LION encompasses 980B tokens.
37
 
38
  - **Developed by:** Products Pillar, AI Singapore
39
  - **Funded by:** Singapore NRF
@@ -43,7 +43,7 @@ The training data for SEA-LION encompasses 980B tokens.
43
 
44
  ### Performance Benchmarks
45
 
46
- SEA-LION has an average performance on general tasks in English (as measured by Hugging Face's LLM Leaderboard):
47
 
48
  | Model | ARC | HellaSwag | MMLU | TruthfulQA | Average |
49
  |-------------|:-----:|:---------:|:-----:|:----------:|:-------:|
@@ -53,7 +53,7 @@ SEA-LION has an average performance on general tasks in English (as measured by
53
 
54
  ### Data
55
 
56
- SEA-LION was trained on 980B tokens of the following data:
57
 
58
  | Data Source | Unique Tokens | Multiplier | Total Tokens | Percentage |
59
  |---------------------------|:-------------:|:----------:|:------------:|:----------:|
@@ -79,10 +79,10 @@ SEA-LION was trained on 980B tokens of the following data:
79
 
80
  ### Infrastructure
81
 
82
- SEA-LION was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
83
  on the following hardware:
84
 
85
- | Training Details | SEA-LION 3B |
86
  |----------------------|:------------:|
87
  | AWS EC2 p4d.24xlarge | 30 instances |
88
  | Nvidia A100 40GB GPU | 240 |
@@ -91,7 +91,7 @@ on the following hardware:
91
 
92
  ### Configuration
93
 
94
- | HyperParameter | SEA-LION 3B |
95
  |-------------------|:------------------:|
96
  | Precision | bfloat16 |
97
  | Optimizer | decoupled_adamw |
@@ -105,9 +105,9 @@ on the following hardware:
105
 
106
  ### Model Architecture and Objective
107
 
108
- SEA-LION is a decoder model using the MPT architecture.
109
 
110
- | Parameter | SEA-LION 3B |
111
  |-----------------|:-----------:|
112
  | Layers | 32 |
113
  | d_model | 2560 |
 
13
  - km
14
  - ta
15
  ---
16
+ # SEA-LION-v1-3B
17
 
18
  SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
19
  The size of the models range from 3 billion to 7 billion parameters.
20
+ This is the card for SEA-LION-v1-3B.
21
 
22
  SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
23
 
 
29
  The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
30
  specifically trained to understand the SEA regional context.
31
 
32
+ SEA-LION-v1-3B is built on the robust MPT architecture and has a vocabulary size of 256K.
33
 
34
  For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
35
 
36
+ The training data for SEA-LION-v1-3B encompasses 980B tokens.
37
 
38
  - **Developed by:** Products Pillar, AI Singapore
39
  - **Funded by:** Singapore NRF
 
43
 
44
  ### Performance Benchmarks
45
 
46
+ SEA-LION-v1-3B has an average performance on general tasks in English (as measured by Hugging Face's LLM Leaderboard):
47
 
48
  | Model | ARC | HellaSwag | MMLU | TruthfulQA | Average |
49
  |-------------|:-----:|:---------:|:-----:|:----------:|:-------:|
 
53
 
54
  ### Data
55
 
56
+ SEA-LION-v1-3B was trained on 980B tokens of the following data:
57
 
58
  | Data Source | Unique Tokens | Multiplier | Total Tokens | Percentage |
59
  |---------------------------|:-------------:|:----------:|:------------:|:----------:|
 
79
 
80
  ### Infrastructure
81
 
82
+ SEA-LION-v1-3B was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
83
  on the following hardware:
84
 
85
+ | Training Details | SEA-LION-v1-3B |
86
  |----------------------|:------------:|
87
  | AWS EC2 p4d.24xlarge | 30 instances |
88
  | Nvidia A100 40GB GPU | 240 |
 
91
 
92
  ### Configuration
93
 
94
+ | HyperParameter | SEA-LION-v1-3B |
95
  |-------------------|:------------------:|
96
  | Precision | bfloat16 |
97
  | Optimizer | decoupled_adamw |
 
105
 
106
  ### Model Architecture and Objective
107
 
108
+ SEA-LION-v1-3B is a decoder model using the MPT architecture.
109
 
110
+ | Parameter | SEA-LION-v1-3B |
111
  |-----------------|:-----------:|
112
  | Layers | 32 |
113
  | d_model | 2560 |