Upload readme.md
Browse files
readme.md
ADDED
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# XLM-SWCM: Multilingual Encoder with Shared Weights Pretraining
|
2 |
+
|
3 |
+
## Overview
|
4 |
+
|
5 |
+
XLM-SWCM (Cross-lingual Language Model with Shared Weights Cross-lingual Modeling) is an innovative sequence-to-sequence model specifically designed to address the challenges of extremely low-resource languages. Our framework introduces a novel weight-sharing mechanism between encoder and decoder components, enabling effective knowledge transfer from multilingual encoders to generation tasks.
|
6 |
+
|
7 |
+
## Key Innovations
|
8 |
+
|
9 |
+
* **Shared Weight Framework**: Strategic weight reuse between encoder and decoder layers
|
10 |
+
* **Hybrid Decoder Architecture**: Combines:
|
11 |
+
* Standard transformer decoder layers
|
12 |
+
* Custom decoder layers with dual FFN structure
|
13 |
+
* Optimized layer insertion pattern (1 normal layer per 3 custom layers)
|
14 |
+
* **Efficient Adaptation**: Enables effective text generation with minimal training data
|
15 |
+
|
16 |
+
## Model Architecture
|
17 |
+
|
18 |
+
|
19 |
+
| Component | Description |
|
20 |
+
| -------------- | ------------------------------------------------------------------- |
|
21 |
+
| **Encoder** | XLM-RoBERTa base (CINO v2 variant) |
|
22 |
+
| **Decoder** | Hybrid transformer with: |
|
23 |
+
| | • NormalDecoderLayer: Randomly initialized standard layers |
|
24 |
+
| | • CustomDecoderLayer: Weight-shared layers with dual FFN structure |
|
25 |
+
| **Parameters** | 492M total parameters |
|
26 |
+
|
27 |
+
### Advanced Features
|
28 |
+
|
29 |
+
* Beam search decoding
|
30 |
+
* Mixed-precision training
|
31 |
+
* Cross-lingual transfer learning
|
32 |
+
|
33 |
+
For detailed usage instructions, see our [GitHub repository](https://github.com/asd765973346/xlm-swcm)
|
34 |
+
|
35 |
+
## Supported Languages
|
36 |
+
|
37 |
+
Primary focus on Chinese minority languages:
|
38 |
+
|
39 |
+
* Tibetan (bo)
|
40 |
+
* Uyghur (ug)
|
41 |
+
* Kazakh (kk)
|
42 |
+
* Mongolian (mn)
|
43 |
+
* Chinese (zh)
|
44 |
+
|
45 |
+
## Citation
|
46 |
+
|
47 |
+
```
|
48 |
+
@article{swcm,
|
49 |
+
author = {Zeli Su and Ziyin Zhang and Guixian Xu and Jianing Liu and XU Han and Ting Zhang and Yushuang Dong},
|
50 |
+
title = {Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages},
|
51 |
+
year = {2025},
|
52 |
+
url = {http://dx.doi.org/10.13140/RG.2.2.11262.09285},
|
53 |
+
}
|
54 |
+
```
|