English | Chinese

translate100

translate100 是一个用于翻译任务的 seq-to-seq 架构、基于 Transformer 的神经机器翻译模型，由m2m100(12B)经过蒸馏(small100)及各种处理后，得到的完全适配 translate.js 的一键部署应用。
它的翻译能力很一般，它最大的特点是在超低配置的终端（1核2G内存）运行使用、及做到适配全球主流的上百个语言。

在无GPU场景，对支持量化指令的CPU，会对Linear层进行int8量化，提高运行速度。
禁用梯度计算、关闭了自动求导引擎的其他功能，进一步提升性能并降低内存占用
混合精度加速，当检测到GPU时，动态选择FP16/FP32精度进行计算，在保持精度的同时减少显存使用并提高吞吐量。
针对CPU和GPU环境自动切换优化策略，确保在不同硬件上都能获得最佳性能。
它开放标准的文本翻译接口，一键部署，无需任何配置，在超低配置的云服务器（如1核2G）即可使用。
支持 100 个语种翻译。
无需指定原文，自识别输入的翻译文本，并输出译文。
支持一句话中含有多个语种。

API方式使用

它主要以开放API的方式，以通用http接口请求的方式，提供使用。

/language.json 语言列表

CURL请求:

curl --request GET \
  --url http://127.0.0.1/language.json

返回响应：

{
  "info": "success",
  "list": [
    {
      "id": "afrikaans",
      "name": "\u5357\u975e\u8377\u5170\u8bed",
      "serviceId": "af"
    },
    {
      "id": "amharic",
      "name": "\u963f\u59c6\u54c8\u62c9\u8bed",
      "serviceId": "am"
    },
    {
      "id": "english",
      "name": "\u82f1\u8bed",
      "serviceId": "en"
    },
    {
      "id": "chinese_simplified",
      "name": "\u7b80\u4f53\u4e2d\u6587",
      "serviceId": "zh"
    },
    
    ......

  ],
  "result": 1
}

result 1为成功，0为失败。如果返回0，失败，可以通过 info 来获取失败信息。
list 语言列表，每个元素为一个语言，包含：
- id 语言标识，如： english
- name 语言名称，如：英文
info 失败信息，只有当 result 为0时，才会返回。

/translate.json 翻译接口

CRUL请求：

curl --request POST \
  --url http://127.0.0.1/translate.json \
  --data 'text=["你好","世界"]' \
  --data to=english

text 待翻译的文本。如果是只有一个文本，则支持字符串格式。如果是多个文本，则支持数组格式，每个元素为一个待翻译的文本。
to 目标语言，传入如： english 具体有哪些值，可以通过 /language.json 获取

返回响应：

{
    "result": 1,
    "text": [
        "Hello",
        "The world"
    ],
    "time": 37,
    "to": "english",
    "tokens": 4
}

result 1为成功，0为失败。如果返回0，失败，可以通过 info 来获取失败信息。
text 翻译后的文本，它始终是数组格式。
to 目标语言，跟传入的 to 对应
time 翻译耗时，单位毫秒。
tokens 翻译传入的内容的token数量。
info 失败信息，只有当 result 为0时，才会返回。

/ 首页、健康检查

可直接访问 http://127.0.0.1 就能看到一个欢迎页，这个欢迎页是一个简单的HTML页面，包含了一些基本的信息，另外如果你需要做负载进行健康检查，也可以用这个来作为健康检查的页面。

性能测试

CPU计算	是否加速	内存占用	翻译耗时
Intel i7 7700k	加速	2250MB	43token/s
Intel i7 7700k		970MB	12token/s

GPU计算	是否加速	显存占用	翻译耗时
NVIDIA 1050TI	加速	700MB	83token/s
NVIDIA 1050TI		700MB	55token/s

最低运行要求
极低配置的1核2G内存的普通云服务器即可运行，超过 1token/s 的翻译速率。

私有部署

Linux部署

这里提供在 CentOS 7.4 部署的方式，并开放80端口提供使用：

#sudo yum update -y
yum -y install wget
mkdir -p /mnt/translate100/
#下载预编译包
wget http://down.zvo.cn/centos/cpython-3.10.18+20250712-x86_64-unknown-linux-gnu-install_only.tar.gz -O /mnt/translate100/cpython-3.10.18.tar.gz
cd /mnt/translate100/
yum -y install tar
tar -xzf /mnt/translate100/cpython-3.10.18.tar.gz
rm -rf /mnt/translate100/cpython-3.10.18.tar.gz

#配置环境变量
export PATH="/mnt/translate100/python/bin:$PATH"
# 加入开机自启动
echo 'PATH="/mnt/translate100/python/bin:$PATH"'>>/etc/rc.d/rc.local
# 打印当前python版本，应该是 3.10.18
python --version

# 安装环境，采用阿里的源
pip install -i https://mirrors.aliyun.com/pypi/simple/ flask flask-cors transformers sentencepiece py-cpuinfo
#pip install -i https://mirrors.aliyun.com/pypi/simple/ --upgrade pip
pip install -i https://mirrors.aliyun.com/pypi/simple/ torch
#如果使用英伟达GPU进行计算，需安装
#pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128

#永久关闭防火墙（禁用开机自启）
sudo systemctl disable firewalld
sudo systemctl stop firewalld

wget https://hf-mirror.com/xnx3/translate100/resolve/main/pytorch_model.bin?download=true -O /mnt/translate100/pytorch_model.bin
wget https://hf-mirror.com/xnx3/translate100/resolve/main/config.json?download=true -O /mnt/translate100/config.json
wget https://hf-mirror.com/xnx3/translate100/resolve/main/sentencepiece.bpe.model?download=true -O /mnt/translate100/sentencepiece.bpe.model
wget https://hf-mirror.com/xnx3/translate100/resolve/main/special_tokens_map.json?download=true -O /mnt/translate100/special_tokens_map.json
wget https://hf-mirror.com/xnx3/translate100/resolve/main/tokenization_small100.py?download=true -O /mnt/translate100/tokenization_small100.py
wget https://hf-mirror.com/xnx3/translate100/resolve/main/tokenizer_config.json?download=true -O /mnt/translate100/tokenizer_config.json
wget https://hf-mirror.com/xnx3/translate100/resolve/main/translate100.py?download=true -O /mnt/translate100/translate100.py
wget https://hf-mirror.com/xnx3/translate100/resolve/main/vocab.json?download=true -O /mnt/translate100/vocab.json
wget https://hf-mirror.com/xnx3/translate100/resolve/main/start.sh?download=true -O /mnt/translate100/start.sh
mkdir /mnt/translate100/resources/
wget https://hf-mirror.com/xnx3/translate100/resolve/main/resources/translate.js?download=true -O /mnt/translate100/resources/translate.js

# Add auto start upon startup
echo '/mnt/translate100/start.sh'>>/etc/rc.d/rc.local
chmod +x /etc/rc.d/rc.local
chmod +x /mnt/translate100/start.sh
chmod -R 777 /mnt/translate100/start.sh
# Run
/mnt/translate100/start.sh

Windows部署

下载应用程序：
https://huggingface.co/xnx3/translate100/resolve/main/translate100.exe?download=true
如果你在中国，下载地址： https://hf-mirror.com/xnx3/translate100/resolve/main/translate100.exe?download=true

双击即可直接打开运行。（注意，双击运行后，它会有几分钟的加载过程，感觉上是打开后没有任何反应的，它在读取数据，请耐心等待）

配置参数

它提供以系统环境变量的方式来设置相关参数，控制 translate100 的运行。

TRANSLATE100_PORT

环境变量设置的端口号，默认80

TRANSLATE100_PORT=80

TRANSLATE100_USE_GPU

环境变量设置是否使用GPU，默认true
设置为false则是即使有GPU也不用，强制使用CPU

TRANSLATE100_USE_GPU=true

TRANSLATE100_QUICK

环境变量设置是否使用快速模式，默认不设置则是不启用。
快速模式将对翻译速度进行明显加快，同时翻译的质量会稍微降低。

如果你使用CPU运行
- 他会检测是否支持 AVX2 指令集，如果支持，则会自动对Linear层进行int8量化，提高运行速度。
- 他会检测是否支持 channels_last ，如果支持，则会自动对模型进行转换，提高运行速度。
- 不使用束搜索，只使用贪心搜索。
如果你使用GPU(Nvidia)运行
- 半精度量化
- 优化资源利用效率并实现自适应性能调优
- 不使用束搜索，只使用贪心搜索。

TRANSLATE100_QUICK=true

Windows设置端口号示例

这里以windows系统运行时，端口号有默认的80 改为 8000 ，来作为一个示例
首先，打开 cmd 命令窗口，输入：

set TRANSLATE100_PORT=80

然后再双击运行 translate100.exe ，启动完毕后，便可使用 8000 端口来请求了

Citation

If you use this model for your research, please cite the following work:

@inproceedings{mohammadshahi-etal-2022-small,
    title = "{SM}a{LL}-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages",
    author = "Mohammadshahi, Alireza  and
      Nikoulina, Vassilina  and
      Berard, Alexandre  and
      Brun, Caroline  and
      Henderson, James  and
      Besacier, Laurent",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-main.571",
    pages = "8348--8359",
    abstract = "In recent years, multilingual machine translation models have achieved promising performance on low-resource language pairs by sharing information between similar languages, thus enabling zero-shot translation. To overcome the {``}curse of multilinguality{''}, these models often opt for scaling up the number of parameters, which makes their use in resource-constrained environments challenging. We introduce SMaLL-100, a distilled version of the M2M-100(12B) model, a massively multilingual machine translation model covering 100 languages. We train SMaLL-100 with uniform sampling across all language pairs and therefore focus on preserving the performance of low-resource languages. We evaluate SMaLL-100 on different low-resource benchmarks: FLORES-101, Tatoeba, and TICO-19 and demonstrate that it outperforms previous massively multilingual models of comparable sizes (200-600M) while improving inference latency and memory usage. Additionally, our model achieves comparable results to M2M-100 (1.2B), while being 3.6x smaller and 4.3x faster at inference.",
}

@inproceedings{mohammadshahi-etal-2022-compressed,
    title = "What Do Compressed Multilingual Machine Translation Models Forget?",
    author = "Mohammadshahi, Alireza  and
      Nikoulina, Vassilina  and
      Berard, Alexandre  and
      Brun, Caroline  and
      Henderson, James  and
      Besacier, Laurent",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-emnlp.317",
    pages = "4308--4329",
    abstract = "Recently, very large pre-trained models achieve state-of-the-art results in various natural language processing (NLP) tasks, but their size makes it more challenging to apply them in resource-constrained environments. Compression techniques allow to drastically reduce the size of the models and therefore their inference time with negligible impact on top-tier metrics. However, the general performance averaged across multiple tasks and/or languages may hide a drastic performance drop on under-represented features, which could result in the amplification of biases encoded by the models. In this work, we assess the impact of compression methods on Multilingual Neural Machine Translation models (MNMT) for various language groups, gender, and semantic biases by extensive analysis of compressed models on different machine translation benchmarks, i.e. FLORES-101, MT-Gender, and DiBiMT. We show that the performance of under-represented languages drops significantly, while the average BLEU metric only slightly decreases. Interestingly, the removal of noisy memorization with compression leads to a significant improvement for some medium-resource languages. Finally, we demonstrate that compression amplifies intrinsic gender and semantic biases, even in high-resource languages.",
}