Update README.md
Browse files
README.md
CHANGED
@@ -39,7 +39,7 @@ learns contextualized word embeddings by predicting missing words within sentenc
|
|
39 |
process known as masked language modeling. This allows BERT to understand words in the
|
40 |
context of their surrounding words, leading to more meaningful and context-aware embeddings.
|
41 |
|
42 |
-
This model is based on the BERT-Base architecture with 12 layers, 768 hidden size, 12 attention heads, and 110 million parameters.
|
43 |
|
44 |
## How to use
|
45 |
|
@@ -57,6 +57,11 @@ print(outputs)
|
|
57 |
```
|
58 |
|
59 |
|
|
|
|
|
|
|
|
|
|
|
60 |
## Results
|
61 |
|
62 |
| **Metric** | **Train Loss** | **Eval Loss** | **Perplexity** | **NER** | **POS** | **Shallow Parsing** | **QA** |
|
|
|
39 |
process known as masked language modeling. This allows BERT to understand words in the
|
40 |
context of their surrounding words, leading to more meaningful and context-aware embeddings.
|
41 |
|
42 |
+
This model is based on the BERT-Base architecture with 12 layers, 768 hidden size, 12 attention heads, and 110 million parameters.
|
43 |
|
44 |
## How to use
|
45 |
|
|
|
57 |
```
|
58 |
|
59 |
|
60 |
+
## Training Details
|
61 |
+
|
62 |
+
The model was trained on a corpus of 36 GB Bangla text data with a vocabulary size of 50k tokens. The model was trained for 1 million steps with a batch size of 440 and a learning rate of 5e-5. The model was trained on two NVIDIA GeForce A40 GPUs.
|
63 |
+
|
64 |
+
|
65 |
## Results
|
66 |
|
67 |
| **Metric** | **Train Loss** | **Eval Loss** | **Perplexity** | **NER** | **POS** | **Shallow Parsing** | **QA** |
|