banglagov commited on
Commit
e2e1d05
·
verified ·
1 Parent(s): 11547fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -39,7 +39,7 @@ learns contextualized word embeddings by predicting missing words within sentenc
39
  process known as masked language modeling. This allows BERT to understand words in the
40
  context of their surrounding words, leading to more meaningful and context-aware embeddings.
41
 
42
- This model is based on the BERT-Base architecture with 12 layers, 768 hidden size, 12 attention heads, and 110 million parameters. The model was trained on a corpus of 39 GB Bangla text data with a vocabulary size of 50k tokens. The model was trained for 1 million steps with a batch size of 440 and a learning rate of 5e-5. The model was trained on two NVIDIA GeForce A40 GPUs.
43
 
44
  ## How to use
45
 
@@ -57,6 +57,11 @@ print(outputs)
57
  ```
58
 
59
 
 
 
 
 
 
60
  ## Results
61
 
62
  | **Metric** | **Train Loss** | **Eval Loss** | **Perplexity** | **NER** | **POS** | **Shallow Parsing** | **QA** |
 
39
  process known as masked language modeling. This allows BERT to understand words in the
40
  context of their surrounding words, leading to more meaningful and context-aware embeddings.
41
 
42
+ This model is based on the BERT-Base architecture with 12 layers, 768 hidden size, 12 attention heads, and 110 million parameters.
43
 
44
  ## How to use
45
 
 
57
  ```
58
 
59
 
60
+ ## Training Details
61
+
62
+ The model was trained on a corpus of 36 GB Bangla text data with a vocabulary size of 50k tokens. The model was trained for 1 million steps with a batch size of 440 and a learning rate of 5e-5. The model was trained on two NVIDIA GeForce A40 GPUs.
63
+
64
+
65
  ## Results
66
 
67
  | **Metric** | **Train Loss** | **Eval Loss** | **Perplexity** | **NER** | **POS** | **Shallow Parsing** | **QA** |