Update README.md
Browse files
README.md
CHANGED
|
@@ -49,6 +49,11 @@ Figure 3: Test loss closeup, testing performed on split of internal-corpus #1. S
|
|
| 49 |
|
| 50 |
## Training Method
|
| 51 |
### Vocabulary Swap
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
|
| 53 |
We managed to align 4,177 english tokens with corresponding czech tokens.
|
| 54 |
|
|
|
|
| 49 |
|
| 50 |
## Training Method
|
| 51 |
### Vocabulary Swap
|
| 52 |
+
To transfer knowledge from English model to Czech, we developed a simple method that (i) aligns several tokens between two vocabularies and (ii) copies the embeddings from original language to new language.
|
| 53 |
+
<img src="figures/tllama_test.png" width="900"/>
|
| 54 |
+
|
| 55 |
+
Figure 4: Ablation: Test perplexity over the course of training for vocabulary swap method on TinyLLAMA. Our method (green curve) vs TinyLLAMA training from scratch (blue curve).
|
| 56 |
+
|
| 57 |
The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
|
| 58 |
We managed to align 4,177 english tokens with corresponding czech tokens.
|
| 59 |
|