Riva-Translate-4B-Instruct

Model Overview

The Riva-Translate-4B-Instruct Neural Machine Translation model translates text in 12 languages. The supported languages are: English(en), German(de), European Spanish(es-ES), LATAM Spanish(es-US), France(fr), Brazillian Portugese(pt-BR), Russian(ru), Simplified Chinese(zh-CN), Traditional Chinese(zh-TW), Japanese(ja),Korean(ko), Arabic(ar). This model was developed based on the decoder-only Transformer architecture. It is a fine-tuned version of a 4B Base model that was pruned and distilled from nvidia/Mistral-NeMo-Minitron-8B-Base using our LLM compression technique. The model was trained using a multi-stage CPT and SFT. It uses tiktoken as the tokenizer. The model supports a context length of 8K tokens.

Model Developer: NVIDIA

Model Dates: Riva-Translate-4B-Instruct was trained between Jan 2025 and April 2025.

License

NVIDIA Open Model License Agreement

Prompt Format:

We recommend using the following prompt template, which was used to fine-tune the model. The model may not perform optimally without it.

<s>System
{system prompt}</s>
<s>User
{user prompt}</s>
<s>Assistant\n

Note that a newline character (\n) should be added after <s>Assistant as a generation prompt.
Note that users are required to use the correct language name in the prompt: 'ar': 'Arabic', 'en': 'English', 'de': 'German', 'es-es': 'European Spanish', 'es-us': 'Latin American Spanish', 'fr': 'French', 'ja': 'Japanese', 'ko': 'Korean', 'ru': 'Russian', 'zh-cn': 'Simplified Chinese', 'zh-tw': 'Traditional Chinese', 'pt-br': 'Brazilian Portuguese'

For example, to translate an English sentence into Simplified Chinese:

<s>System
You are an expert at translating text from English to Simplified Chinese.</s>
<s>User
What is the Simplified Chinese translation of the sentence: The GRACE mission is a collaboration between the NASA and German Aerospace Center.?</s>
<s>Assistant

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM


tokenizer = AutoTokenizer.from_pretrained("nvidia/Riva-Translate-4B-Instruct")
model = AutoModelForCausalLM.from_pretrained("nvidia/Riva-Translate-4B-Instruct").cuda()


# Use the prompt template
messages = [
    {
        "role": "system",
        "content": "You are an expert at translating text from English to Simplified Chinese.",
    },
    {"role": "user", "content": "What is the Simplified Chinese translation of the sentence: The GRACE mission is a collaboration between the NASA and German Aerospace Center.?"},
 ]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(tokenized_chat,  max_new_tokens=128, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0]))

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Technical Limitations & Mitigation:

Accuracy varies based on the characteristics of input (Domain, Use Case, Noise, Context, etc.). Grammar errors and semantic issues may be present. As a potential mitigation, the user can change the prompt to get a better translation.

Use Case Restrictions:

Abide by NVIDIA Open Model License Agreement