Update README.md
Browse files
README.md
CHANGED
@@ -13,11 +13,11 @@ language:
|
|
13 |
- km
|
14 |
- ta
|
15 |
---
|
16 |
-
# SEA-LION
|
17 |
|
18 |
SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
|
19 |
The size of the models range from 3 billion to 7 billion parameters.
|
20 |
-
This is the card for
|
21 |
|
22 |
SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
|
23 |
|
@@ -29,11 +29,11 @@ SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
|
|
29 |
The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
|
30 |
specifically trained to understand the SEA regional context.
|
31 |
|
32 |
-
SEA-LION is built on the robust MPT architecture and has a vocabulary size of 256K.
|
33 |
|
34 |
For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
|
35 |
|
36 |
-
The training data for SEA-LION encompasses 980B tokens.
|
37 |
|
38 |
- **Developed by:** Products Pillar, AI Singapore
|
39 |
- **Funded by:** Singapore NRF
|
@@ -43,7 +43,7 @@ The training data for SEA-LION encompasses 980B tokens.
|
|
43 |
|
44 |
### Performance Benchmarks
|
45 |
|
46 |
-
SEA-LION has an average performance on general tasks in English (as measured by Hugging Face's LLM Leaderboard):
|
47 |
|
48 |
| Model | ARC | HellaSwag | MMLU | TruthfulQA | Average |
|
49 |
|-------------|:-----:|:---------:|:-----:|:----------:|:-------:|
|
@@ -53,7 +53,7 @@ SEA-LION has an average performance on general tasks in English (as measured by
|
|
53 |
|
54 |
### Data
|
55 |
|
56 |
-
SEA-LION was trained on 980B tokens of the following data:
|
57 |
|
58 |
| Data Source | Unique Tokens | Multiplier | Total Tokens | Percentage |
|
59 |
|---------------------------|:-------------:|:----------:|:------------:|:----------:|
|
@@ -79,10 +79,10 @@ SEA-LION was trained on 980B tokens of the following data:
|
|
79 |
|
80 |
### Infrastructure
|
81 |
|
82 |
-
SEA-LION was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
|
83 |
on the following hardware:
|
84 |
|
85 |
-
| Training Details | SEA-LION
|
86 |
|----------------------|:------------:|
|
87 |
| AWS EC2 p4d.24xlarge | 30 instances |
|
88 |
| Nvidia A100 40GB GPU | 240 |
|
@@ -91,7 +91,7 @@ on the following hardware:
|
|
91 |
|
92 |
### Configuration
|
93 |
|
94 |
-
| HyperParameter | SEA-LION
|
95 |
|-------------------|:------------------:|
|
96 |
| Precision | bfloat16 |
|
97 |
| Optimizer | decoupled_adamw |
|
@@ -105,9 +105,9 @@ on the following hardware:
|
|
105 |
|
106 |
### Model Architecture and Objective
|
107 |
|
108 |
-
SEA-LION is a decoder model using the MPT architecture.
|
109 |
|
110 |
-
| Parameter | SEA-LION
|
111 |
|-----------------|:-----------:|
|
112 |
| Layers | 32 |
|
113 |
| d_model | 2560 |
|
|
|
13 |
- km
|
14 |
- ta
|
15 |
---
|
16 |
+
# SEA-LION-v1-3B
|
17 |
|
18 |
SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
|
19 |
The size of the models range from 3 billion to 7 billion parameters.
|
20 |
+
This is the card for SEA-LION-v1-3B.
|
21 |
|
22 |
SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
|
23 |
|
|
|
29 |
The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
|
30 |
specifically trained to understand the SEA regional context.
|
31 |
|
32 |
+
SEA-LION-v1-3B is built on the robust MPT architecture and has a vocabulary size of 256K.
|
33 |
|
34 |
For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
|
35 |
|
36 |
+
The training data for SEA-LION-v1-3B encompasses 980B tokens.
|
37 |
|
38 |
- **Developed by:** Products Pillar, AI Singapore
|
39 |
- **Funded by:** Singapore NRF
|
|
|
43 |
|
44 |
### Performance Benchmarks
|
45 |
|
46 |
+
SEA-LION-v1-3B has an average performance on general tasks in English (as measured by Hugging Face's LLM Leaderboard):
|
47 |
|
48 |
| Model | ARC | HellaSwag | MMLU | TruthfulQA | Average |
|
49 |
|-------------|:-----:|:---------:|:-----:|:----------:|:-------:|
|
|
|
53 |
|
54 |
### Data
|
55 |
|
56 |
+
SEA-LION-v1-3B was trained on 980B tokens of the following data:
|
57 |
|
58 |
| Data Source | Unique Tokens | Multiplier | Total Tokens | Percentage |
|
59 |
|---------------------------|:-------------:|:----------:|:------------:|:----------:|
|
|
|
79 |
|
80 |
### Infrastructure
|
81 |
|
82 |
+
SEA-LION-v1-3B was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
|
83 |
on the following hardware:
|
84 |
|
85 |
+
| Training Details | SEA-LION-v1-3B |
|
86 |
|----------------------|:------------:|
|
87 |
| AWS EC2 p4d.24xlarge | 30 instances |
|
88 |
| Nvidia A100 40GB GPU | 240 |
|
|
|
91 |
|
92 |
### Configuration
|
93 |
|
94 |
+
| HyperParameter | SEA-LION-v1-3B |
|
95 |
|-------------------|:------------------:|
|
96 |
| Precision | bfloat16 |
|
97 |
| Optimizer | decoupled_adamw |
|
|
|
105 |
|
106 |
### Model Architecture and Objective
|
107 |
|
108 |
+
SEA-LION-v1-3B is a decoder model using the MPT architecture.
|
109 |
|
110 |
+
| Parameter | SEA-LION-v1-3B |
|
111 |
|-----------------|:-----------:|
|
112 |
| Layers | 32 |
|
113 |
| d_model | 2560 |
|