File size: 2,604 Bytes
53bb2b4 11a0f95 53bb2b4 11a0f95 53bb2b4 11a0f95 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
license: apache-2.0
datasets:
- pkupie/mc2_corpus
language:
- bo
- ug
- kk
- mn
- zh
base_model:
- hfl/cino-base-v2
---
# XLM-SWCM: Multilingual Encoder with Shared Weights Pretraining
## Overview
XLM-SWCM (Cross-lingual Language Model with Shared Weights Cross-lingual Modeling) is an innovative sequence-to-sequence model specifically designed to address the challenges of extremely low-resource languages. Our framework introduces a novel weight-sharing mechanism between encoder and decoder components, enabling effective knowledge transfer from multilingual encoders to generation tasks.
## Key Innovations
* **Shared Weight Framework**: Strategic weight reuse between encoder and decoder layers
* **Hybrid Decoder Architecture**: Combines:
* Standard transformer decoder layers
* Custom decoder layers with dual FFN structure
* Optimized layer insertion pattern (1 normal layer per 3 custom layers)
* **Efficient Adaptation**: Enables effective text generation with minimal training data
## Model Architecture
| Component | Description |
| -------------- | ------------------------------------------------------------------- |
| **Encoder** | XLM-RoBERTa base (CINO v2 variant) |
| **Decoder** | Hybrid transformer with: |
| | • NormalDecoderLayer: Randomly initialized standard layers |
| | • CustomDecoderLayer: Weight-shared layers with dual FFN structure |
| **Parameters** | 492M total parameters |
### Advanced Features
* Beam search decoding
* Mixed-precision training
* Cross-lingual transfer learning
For detailed usage instructions, see our [GitHub repository](https://github.com/asd765973346/xlm-swcm)
## Supported Languages
Primary focus on Chinese minority languages:
* Tibetan (bo)
* Uyghur (ug)
* Kazakh (kk)
* Mongolian (mn)
* Chinese (zh)
## Citation
```
@article{su2025multilingualencoderknowsrealize,
author = {Zeli Su and Ziyin Zhang and Guixian Xu and Jianing Liu and Xu Han and Ting Zhang and Yushuang Dong},
title = {Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining
for Extremely Low-Resource Languages},
journal = {CoRR},
volume = {abs/2502.10852},
year = {2025},
url = {https://doi.org/10.48550/arXiv.2502.10852},
doi = {10.48550/ARXIV.2502.10852},
eprinttype = {arXiv},
eprint = {2502.10852}
} |