# `quickmt-zh-en` Neural Machine Translation Model # Usage ## Install `quickmt` ```bash git clone https://github.com/quickmt/quickmt.git pip install ./quickmt/ ``` ## Download model ```bash quickmt-model-download quickmt/quickmt-zh-en ./quickmt-zh-en ``` ## Use model ```python from quickmt import Translator # Auto-detects GPU, set to "cpu" to force CPU inference t = Translator("./quickmt-zh-en/", device="auto") # Translate - set beam size to 5 for higher quality (but slower speed) t(["他补充道:“我们现在有 4 个月大没有糖尿病的老鼠,但它们曾经得过该病。”"], beam_size=1) # Get alternative translations by sampling # You can pass any cTranslate2 `translate_batch` arguments t(["他补充道:“我们现在有 4 个月大没有糖尿病的老鼠,但它们曾经得过该病。”"], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9) ``` # Model Information * Trained using [`eole`](https://github.com/eole-nlp/eole) * Exported for fast inference to []CTranslate2](https://github.com/OpenNMT/CTranslate2) format * Training data: https://huggingface.co/datasets/quickmt/quickmt-train.zh-en/tree/main ## Metrics BLEU and CHRF2 calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the Flores200 `devtest` test set ("zho_Hans"->"eng_Latn"). | Model | bleu | chrf2 | | ---- | ---- | ---- | | quickmt/quickmt-zh-en | 28.58 | 57.46 | | Helsinki-NLP/opus-mt-zh-en | 23.35 | 53.60 | | facebook/m2m100_418M | 18.96 | 50.06 | | facebook/m2m100_1.2B | 24.68 | 54.68 | | facebook/nllb-200-distilled-600M | 26.22 | 55.17 | | facebook/nllb-200-distilled-1.3B | 28.54 | 57.34 | | google/madlad400-3b-mt | 28.74 | 58.01 | ## Training Configuration ```yaml ## IO save_data: zh_en/data_spm overwrite: True seed: 1234 report_every: 100 valid_metrics: ["BLEU"] tensorboard: true tensorboard_log_dir: tensorboard ### Vocab src_vocab: zh-en/src.eole.vocab tgt_vocab: zh-en/tgt.eole.vocab src_vocab_size: 20000 tgt_vocab_size: 20000 vocab_size_multiple: 8 share_vocab: False n_sample: 0 data: corpus_1: path_src: hf://quickmt/quickmt-train-zh-en/zh path_tgt: hf://quickmt/quickmt-train-zh-en/en path_sco: hf://quickmt/quickmt-train-zh-en/sco valid: path_src: zh-en/dev.zho path_tgt: zh-en/dev.eng transforms: [sentencepiece, filtertoolong] transforms_configs: sentencepiece: src_subword_model: "zh-en/src.spm.model" tgt_subword_model: "zh-en/tgt.spm.model" filtertoolong: src_seq_length: 512 tgt_seq_length: 512 training: # Run configuration model_path: quickmt-zh-en keep_checkpoint: 4 save_checkpoint_steps: 1000 train_steps: 200000 valid_steps: 1000 # Train on a single GPU world_size: 1 gpu_ranks: [0] # Batching batch_type: "tokens" batch_size: 13312 valid_batch_size: 13312 batch_size_multiple: 8 accum_count: [4] accum_steps: [0] # Optimizer & Compute compute_dtype: "bfloat16" optim: "pagedadamw8bit" learning_rate: 1.0 warmup_steps: 10000 decay_method: "noam" adam_beta2: 0.998 # Data loading bucket_size: 262144 num_workers: 4 prefetch_factor: 100 # Hyperparams dropout_steps: [0] dropout: [0.1] attention_dropout: [0.1] max_grad_norm: 0 label_smoothing: 0.1 average_decay: 0.0001 param_init_method: xavier_uniform normalization: "tokens" model: architecture: "transformer" layer_norm: standard share_embeddings: false share_decoder_embeddings: true add_ffnbias: true mlp_activation_fn: gated-silu add_estimator: false add_qkvbias: false norm_eps: 1e-6 hidden_size: 1024 encoder: layers: 8 decoder: layers: 2 heads: 16 transformer_ff: 4096 embeddings: word_vec_size: 1024 position_encoding_type: "SinusoidalInterleaved" ```