English | Chinese

translate100

Translate100 is a seq to seq architecture and Transformer based neural machine translation model used for translation tasks. It is a one click deployment application that is fully adapted to translate.js, obtained by distilling (small100) and processing m2m100 (12B).
Its translation ability is average, and its biggest feature is running on ultra-low configuration terminals (1 core 2G memory) and adapting to hundreds of mainstream languages worldwide.

In GPU free scenarios, CPUs that support quantization instructions will perform int8 quantization on the Linear layer to improve running speed.
Disable gradient calculation and turn off other functions of the automatic differentiation engine to further improve performance and reduce memory usage
Hybrid precision acceleration, when GPU is detected, dynamically selects FP16/FP32 precision for calculation, reducing memory usage and improving throughput while maintaining precision.
Optimize strategies for automatic switching between CPU and GPU environments to ensure optimal performance on different hardware.
It has an open standard text translation interface, one click deployment, no configuration required, and can be used on ultra-low configuration cloud servers (such as 1 core 2G).
Supports translation into 100 languages.
No need to specify the original text, self recognize the input translated text, and output the translation.
Support multiple languages in one sentence.

Using in API mode

It mainly provides usage through open APIs and universal HTTP interface requests.

/language.json - Lists of languages

CURL request:

curl --request GET \
  --url http://127.0.0.1/language.json

Response:

{
  "info": "success",
  "list": [
    {
      "id": "afrikaans",
      "name": "\u5357\u975e\u8377\u5170\u8bed",
      "serviceId": "af"
    },
    {
      "id": "amharic",
      "name": "\u963f\u59c6\u54c8\u62c9\u8bed",
      "serviceId": "am"
    },
    {
      "id": "english",
      "name": "\u82f1\u8bed",
      "serviceId": "en"
    },
    {
      "id": "chinese_simplified",
      "name": "\u7b80\u4f53\u4e2d\u6587",
      "serviceId": "zh"
    },
    
    ......

  ],
  "result": 1
}

result 1 represents success, 0 represents failure.
If it returns 0 and fails, the failure information can be obtained through info.
list Language list, each element is a language, containing:
- id Language identification, such as : english
- name Language name, such as : English
info The failure message will only be returned when the result is 0.

/translate.json - Text translation Api

CRUL request:

curl --request POST \
  --url http://127.0.0.1/translate.json \
  --data 'text=["你好","世界"]' \
  --data to=english

text The text to be translated. If there is only one text, string format is supported. If there are multiple texts, array format is supported, with each element representing a text to be translated.
to The target language can be obtained by passing in specific values such as : english , which can be obtained through /language.json

Response:

{
    "result": 1,
    "text": [
        "Hello",
        "The world"
    ],
    "time": 37,
    "to": "english",
    "tokens": 4
}

result 1 represents success, 0 represents failure. If it returns 0 and fails, the failure information can be obtained through info.
text The translated text is always in array format.
to Target language, corresponding to the incoming 'to'
time Translation time, measured in milliseconds.
tokens The number of tokens for translating the incoming content.
info The failure message will only be returned when the result is 0.

/ Home, Health Check

Directly accessible http://127.0.0.1 You can see a welcome page, which is a simple HTML page containing some basic information. In addition, if you need to perform load health checks, you can also use this as a health check page.

Performance testing

CPU computation	Is it accelerating	Memory usage	Translation time
Intel i7 7700k	accelerating	2250MB	43token/s
Intel i7 7700k		970MB	12token/s

GPU computation	Is it accelerating	GPU Memory Usage	Translation time
NVIDIA 1050TI	accelerating	700MB	83token/s
NVIDIA 1050TI		700MB	55token/s

Minimum operating requirements
A regular cloud server with a very low configuration of 1 core and 2G memory can run, with a translation rate exceeding 1 token/s.

Private deployment

Linux

Here is a way to deploy on CentOS 7.4 and open port 80 for use:

#sudo yum update -y
yum -y install wget
mkdir -p /mnt/translate100/
# Download pre compiled package
wget http://down.zvo.cn/centos/cpython-3.10.18+20250712-x86_64-unknown-linux-gnu-install_only.tar.gz -O /mnt/translate100/cpython-3.10.18.tar.gz
cd /mnt/translate100/
yum -y install tar
tar -xzf /mnt/translate100/cpython-3.10.18.tar.gz
rm -rf /mnt/translate100/cpython-3.10.18.tar.gz

# Configure environment variables
export PATH="/mnt/translate100/python/bin:$PATH"
# Add auto start upon startup
echo 'PATH="/mnt/translate100/python/bin:$PATH"'>>/etc/rc.d/rc.local
# Print the current Python version, which should be 3.10.18
python --version

# Installation environment, using Alibaba's source
pip install -i https://mirrors.aliyun.com/pypi/simple/ flask flask-cors transformers sentencepiece py-cpuinfo
#pip install -i https://mirrors.aliyun.com/pypi/simple/ --upgrade pip
pip install -i https://mirrors.aliyun.com/pypi/simple/ torch
# If using Nvidia GPU for computing, installation is required
#pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128

# Permanently turn off the firewall (disable auto boot)
sudo systemctl disable firewalld
sudo systemctl stop firewalld

wget https://hf-mirror.com/xnx3/translate100/resolve/main/pytorch_model.bin?download=true -O /mnt/translate100/pytorch_model.bin
wget https://hf-mirror.com/xnx3/translate100/resolve/main/config.json?download=true -O /mnt/translate100/config.json
wget https://hf-mirror.com/xnx3/translate100/resolve/main/sentencepiece.bpe.model?download=true -O /mnt/translate100/sentencepiece.bpe.model
wget https://hf-mirror.com/xnx3/translate100/resolve/main/special_tokens_map.json?download=true -O /mnt/translate100/special_tokens_map.json
wget https://hf-mirror.com/xnx3/translate100/resolve/main/tokenization_small100.py?download=true -O /mnt/translate100/tokenization_small100.py
wget https://hf-mirror.com/xnx3/translate100/resolve/main/tokenizer_config.json?download=true -O /mnt/translate100/tokenizer_config.json
wget https://hf-mirror.com/xnx3/translate100/resolve/main/translate100.py?download=true -O /mnt/translate100/translate100.py
wget https://hf-mirror.com/xnx3/translate100/resolve/main/vocab.json?download=true -O /mnt/translate100/vocab.json
wget https://hf-mirror.com/xnx3/translate100/resolve/main/start.sh?download=true -O /mnt/translate100/start.sh
mkdir /mnt/translate100/resources/
wget https://hf-mirror.com/xnx3/translate100/resolve/main/resources/translate.js?download=true -O /mnt/translate100/resources/translate.js

# Add auto start upon startup
echo '/mnt/translate100/start.sh'>>/etc/rc.d/rc.local
chmod +x /etc/rc.d/rc.local
chmod +x /mnt/translate100/start.sh
chmod -R 777 /mnt/translate100/start.sh
# Run
/mnt/translate100/start.sh

Windows

Download the application:
Double click to directly open and run. (Note that after double clicking to run, there will be a loading process for a few minutes, and it feels like there is no response after opening. It is reading data, please be patient.)

Configuration parameter

It provides the ability to set relevant parameters in the form of system environment variables to control the operation of translate100.

TRANSLATE100_PORT

The port number set for the environment variable is 80 by default

TRANSLATE100_PORT=80

TRANSLATE100_USE_GPU

Whether to use GPU for setting environment variables, default to true Setting it to false means that even if there is a GPU, it is not necessary and the CPU is forced to be used

TRANSLATE100_USE_GPU=true

TRANSLATE100_QUICK

Whether to use fast mode for setting environment variables, if not set by default, it will not be enabled.
The fast mode will significantly accelerate the translation speed, while slightly reducing the quality of the translation.

If you are running on a CPU
- He will check if the AVX2 instruction set is supported, and if so, automatically perform int8 quantization on the Linear layer to improve running speed.
- He will check if channels_last is supported, and if so, automatically transform the model to improve running speed.
- Do not use beam search, only use greedy search.
If you are running on a GPU (Nvidia)
- Semi precision quantization
- Optimize resource utilization efficiency and achieve adaptive performance tuning
- Do not use beam search, only use greedy search.

TRANSLATE100_QUICK=true

Example of Windows Port Number Setting

Here is an example of changing the default port number from 80 to 8000 when running on a Windows system
Firstly, open the cmd command window and enter:

set TRANSLATE100_PORT=80

Then double-click to run translate100.exe. After startup, you can use port 8000 to make requests

Citation

If you use this model for your research, please cite the following work:

@inproceedings{mohammadshahi-etal-2022-small,
    title = "{SM}a{LL}-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages",
    author = "Mohammadshahi, Alireza  and
      Nikoulina, Vassilina  and
      Berard, Alexandre  and
      Brun, Caroline  and
      Henderson, James  and
      Besacier, Laurent",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-main.571",
    pages = "8348--8359",
    abstract = "In recent years, multilingual machine translation models have achieved promising performance on low-resource language pairs by sharing information between similar languages, thus enabling zero-shot translation. To overcome the {``}curse of multilinguality{''}, these models often opt for scaling up the number of parameters, which makes their use in resource-constrained environments challenging. We introduce SMaLL-100, a distilled version of the M2M-100(12B) model, a massively multilingual machine translation model covering 100 languages. We train SMaLL-100 with uniform sampling across all language pairs and therefore focus on preserving the performance of low-resource languages. We evaluate SMaLL-100 on different low-resource benchmarks: FLORES-101, Tatoeba, and TICO-19 and demonstrate that it outperforms previous massively multilingual models of comparable sizes (200-600M) while improving inference latency and memory usage. Additionally, our model achieves comparable results to M2M-100 (1.2B), while being 3.6x smaller and 4.3x faster at inference.",
}

@inproceedings{mohammadshahi-etal-2022-compressed,
    title = "What Do Compressed Multilingual Machine Translation Models Forget?",
    author = "Mohammadshahi, Alireza  and
      Nikoulina, Vassilina  and
      Berard, Alexandre  and
      Brun, Caroline  and
      Henderson, James  and
      Besacier, Laurent",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-emnlp.317",
    pages = "4308--4329",
    abstract = "Recently, very large pre-trained models achieve state-of-the-art results in various natural language processing (NLP) tasks, but their size makes it more challenging to apply them in resource-constrained environments. Compression techniques allow to drastically reduce the size of the models and therefore their inference time with negligible impact on top-tier metrics. However, the general performance averaged across multiple tasks and/or languages may hide a drastic performance drop on under-represented features, which could result in the amplification of biases encoded by the models. In this work, we assess the impact of compression methods on Multilingual Neural Machine Translation models (MNMT) for various language groups, gender, and semantic biases by extensive analysis of compressed models on different machine translation benchmarks, i.e. FLORES-101, MT-Gender, and DiBiMT. We show that the performance of under-represented languages drops significantly, while the average BLEU metric only slightly decreases. Interestingly, the removal of noisy memorization with compression leads to a significant improvement for some medium-resource languages. Finally, we demonstrate that compression amplifies intrinsic gender and semantic biases, even in high-resource languages.",
}

xnx3
/

translate100