XLM-SWCM: Multilingual Encoder with Shared Weights Pretraining

Overview

XLM-SWCM (Cross-lingual Language Model with Shared Weights Cross-lingual Modeling) is an innovative sequence-to-sequence model specifically designed to address the challenges of extremely low-resource languages. Our framework introduces a novel weight-sharing mechanism between encoder and decoder components, enabling effective knowledge transfer from multilingual encoders to generation tasks.

Key Innovations

Shared Weight Framework: Strategic weight reuse between encoder and decoder layers
Hybrid Decoder Architecture: Combines:
- Standard transformer decoder layers
- Custom decoder layers with dual FFN structure
- Optimized layer insertion pattern (1 normal layer per 3 custom layers)
Efficient Adaptation: Enables effective text generation with minimal training data

Model Architecture

Component	Description
Encoder	XLM-RoBERTa base (CINO v2 variant)
Decoder	Hybrid transformer with:
	• NormalDecoderLayer: Randomly initialized standard layers
	• CustomDecoderLayer: Weight-shared layers with dual FFN structure
Parameters	492M total parameters

Advanced Features

Beam search decoding
Mixed-precision training
Cross-lingual transfer learning

For detailed usage instructions, see our GitHub repository

Supported Languages

Primary focus on Chinese minority languages:

Tibetan (bo)
Uyghur (ug)
Kazakh (kk)
Mongolian (mn)
Chinese (zh)

Citation

@article{swcm,
  author       = {Zeli Su and Ziyin Zhang and Guixian Xu and Jianing Liu and XU Han and Ting Zhang and Yushuang Dong},
  title        = {Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages},
  year         = {2025},
  url          = {http://dx.doi.org/10.13140/RG.2.2.11262.09285},
}