|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- pkupie/mc2_corpus |
|
language: |
|
- bo |
|
- ug |
|
- kk |
|
- mn |
|
- zh |
|
base_model: |
|
- hfl/cino-base-v2 |
|
--- |
|
# XLM-SWCM: Multilingual Encoder with Shared Weights Pretraining |
|
|
|
## Overview |
|
|
|
XLM-SWCM (Cross-lingual Language Model with Shared Weights Cross-lingual Modeling) is an innovative sequence-to-sequence model specifically designed to address the challenges of extremely low-resource languages. Our framework introduces a novel weight-sharing mechanism between encoder and decoder components, enabling effective knowledge transfer from multilingual encoders to generation tasks. |
|
|
|
## Key Innovations |
|
|
|
* **Shared Weight Framework**: Strategic weight reuse between encoder and decoder layers |
|
* **Hybrid Decoder Architecture**: Combines: |
|
* Standard transformer decoder layers |
|
* Custom decoder layers with dual FFN structure |
|
* Optimized layer insertion pattern (1 normal layer per 3 custom layers) |
|
* **Efficient Adaptation**: Enables effective text generation with minimal training data |
|
|
|
## Model Architecture |
|
|
|
|
|
| Component | Description | |
|
| -------------- | ------------------------------------------------------------------- | |
|
| **Encoder** | XLM-RoBERTa base (CINO v2 variant) | |
|
| **Decoder** | Hybrid transformer with: | |
|
| | • NormalDecoderLayer: Randomly initialized standard layers | |
|
| | • CustomDecoderLayer: Weight-shared layers with dual FFN structure | |
|
| **Parameters** | 492M total parameters | |
|
|
|
### Advanced Features |
|
|
|
* Beam search decoding |
|
* Mixed-precision training |
|
* Cross-lingual transfer learning |
|
|
|
For detailed usage instructions, see our [GitHub repository](https://github.com/asd765973346/xlm-swcm) |
|
|
|
## Supported Languages |
|
|
|
Primary focus on Chinese minority languages: |
|
|
|
* Tibetan (bo) |
|
* Uyghur (ug) |
|
* Kazakh (kk) |
|
* Mongolian (mn) |
|
* Chinese (zh) |
|
|
|
## Citation |
|
|
|
``` |
|
@article{su2025multilingualencoderknowsrealize, |
|
author = {Zeli Su and Ziyin Zhang and Guixian Xu and Jianing Liu and Xu Han and Ting Zhang and Yushuang Dong}, |
|
title = {Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining |
|
for Extremely Low-Resource Languages}, |
|
journal = {CoRR}, |
|
volume = {abs/2502.10852}, |
|
year = {2025}, |
|
url = {https://doi.org/10.48550/arXiv.2502.10852}, |
|
doi = {10.48550/ARXIV.2502.10852}, |
|
eprinttype = {arXiv}, |
|
eprint = {2502.10852} |
|
} |