--- license: apache-2.0 datasets: - pkupie/mc2_corpus language: - bo - ug - kk - mn - zh base_model: - hfl/cino-base-v2 --- # XLM-SWCM: Multilingual Encoder with Shared Weights Pretraining ## Overview XLM-SWCM (Cross-lingual Language Model with Shared Weights Cross-lingual Modeling) is an innovative sequence-to-sequence model specifically designed to address the challenges of extremely low-resource languages. Our framework introduces a novel weight-sharing mechanism between encoder and decoder components, enabling effective knowledge transfer from multilingual encoders to generation tasks. ## Key Innovations * **Shared Weight Framework**: Strategic weight reuse between encoder and decoder layers * **Hybrid Decoder Architecture**: Combines: * Standard transformer decoder layers * Custom decoder layers with dual FFN structure * Optimized layer insertion pattern (1 normal layer per 3 custom layers) * **Efficient Adaptation**: Enables effective text generation with minimal training data ## Model Architecture | Component | Description | | -------------- | ------------------------------------------------------------------- | | **Encoder** | XLM-RoBERTa base (CINO v2 variant) | | **Decoder** | Hybrid transformer with: | | | • NormalDecoderLayer: Randomly initialized standard layers | | | • CustomDecoderLayer: Weight-shared layers with dual FFN structure | | **Parameters** | 492M total parameters | ### Advanced Features * Beam search decoding * Mixed-precision training * Cross-lingual transfer learning For detailed usage instructions, see our [GitHub repository](https://github.com/asd765973346/xlm-swcm) ## Supported Languages Primary focus on Chinese minority languages: * Tibetan (bo) * Uyghur (ug) * Kazakh (kk) * Mongolian (mn) * Chinese (zh) ## Citation ``` @article{su2025multilingualencoderknowsrealize, author = {Zeli Su and Ziyin Zhang and Guixian Xu and Jianing Liu and Xu Han and Ting Zhang and Yushuang Dong}, title = {Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages}, journal = {CoRR}, volume = {abs/2502.10852}, year = {2025}, url = {https://doi.org/10.48550/arXiv.2502.10852}, doi = {10.48550/ARXIV.2502.10852}, eprinttype = {arXiv}, eprint = {2502.10852} }