File size: 2,604 Bytes
53bb2b4
 
 
 
 
 
 
 
 
 
 
 
 
11a0f95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53bb2b4
 
 
 
 
 
11a0f95
53bb2b4
 
 
 
11a0f95
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
license: apache-2.0
datasets:
- pkupie/mc2_corpus
language:
- bo
- ug
- kk
- mn
- zh
base_model:
- hfl/cino-base-v2
---
# XLM-SWCM: Multilingual Encoder with Shared Weights Pretraining

## Overview

XLM-SWCM (Cross-lingual Language Model with Shared Weights Cross-lingual Modeling) is an innovative sequence-to-sequence model specifically designed to address the challenges of extremely low-resource languages. Our framework introduces a novel weight-sharing mechanism between encoder and decoder components, enabling effective knowledge transfer from multilingual encoders to generation tasks.

## Key Innovations

* **Shared Weight Framework**: Strategic weight reuse between encoder and decoder layers
* **Hybrid Decoder Architecture**: Combines:
  * Standard transformer decoder layers
  * Custom decoder layers with dual FFN structure
  * Optimized layer insertion pattern (1 normal layer per 3 custom layers)
* **Efficient Adaptation**: Enables effective text generation with minimal training data

## Model Architecture


| Component      | Description                                                         |
| -------------- | ------------------------------------------------------------------- |
| **Encoder**    | XLM-RoBERTa base (CINO v2 variant)                                  |
| **Decoder**    | Hybrid transformer with:                                            |
|                | • NormalDecoderLayer: Randomly initialized standard layers         |
|                | • CustomDecoderLayer: Weight-shared layers with dual FFN structure |
| **Parameters** | 492M total parameters                                               |

### Advanced Features

* Beam search decoding
* Mixed-precision training
* Cross-lingual transfer learning

For detailed usage instructions, see our [GitHub repository](https://github.com/asd765973346/xlm-swcm)

## Supported Languages

Primary focus on Chinese minority languages:

* Tibetan (bo)
* Uyghur (ug)
* Kazakh (kk)
* Mongolian (mn)
* Chinese (zh)

## Citation

```
@article{su2025multilingualencoderknowsrealize,
  author       = {Zeli Su and Ziyin Zhang and Guixian Xu and Jianing Liu and Xu Han and Ting Zhang and Yushuang Dong},
  title        = {Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining
                  for Extremely Low-Resource Languages},
  journal      = {CoRR},
  volume       = {abs/2502.10852},
  year         = {2025},
  url          = {https://doi.org/10.48550/arXiv.2502.10852},
  doi          = {10.48550/ARXIV.2502.10852},
  eprinttype    = {arXiv},
  eprint       = {2502.10852}
}