Model Card for meinvirgos/aina-translator-es-ast-onnx

Translator spanish - asturian

version of: projecte-aina/aina-translator-es-ast to 4 bits

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: [email protected]
Funded by [optional]:
Shared by [optional]:
Model type: M2M100
Language(s) (NLP): spanish, asturian
License: cc-by-nc-4.0
Finetuned from model [optional]: projecte-aina/aina-translator-es-ast

Model Sources [optional]

Repository: projecte-aina/aina-translator-es-ast
Paper [optional]:
Demo [optional]:

Uses

Translation from spanish to asturian

Direct Use

The model is intended to be used as a intermediate step to other formats

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

from optimum.onnxruntime import ORTModelForSeq2SeqLM 
model = ORTModelForSeq2SeqLM.from_pretrained("meinvirgos/aina-translator-es-ast-onnx")
print ("leido modelo")
from transformers import NllbTokenizer
tokenizer_name = "meinvirgos/aina-translator-es-ast-onnx"
tokenizer = NllbTokenizer.from_pretrained(tokenizer_name, token=True, src_lang="spa_Latn")
print ("leido tokenizer")
encoded_hi = tokenizer("Hola papá", return_tensors="pt")
# Generate the translation
generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.convert_tokens_to_ids("ast_Latn"))
# Decode the output
output_text = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

print(output_text)  # Output: "Hola pá"

How the model was obtained

Inicialization

!pip install optimum[exporters]

Reading model as ONNX

from optimum.onnxruntime import ORTModelForSeq2SeqLM 
model = ORTModelForSeq2SeqLM.from_pretrained("projecte-aina/aina-translator-es-ast", export = True)
print ("leido modelo")

Testing and saving

from transformers import NllbTokenizer
tokenizer_name = "facebook/nllb-200-distilled-600M"
tokenizer = NllbTokenizer.from_pretrained(tokenizer_name, token=True, src_lang="spa_Latn")
print ("leido tokenizer")
encoded_hi = tokenizer("Hola papá", return_tensors="pt")
# Generate the translation
generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.convert_tokens_to_ids("ast_Latn"))
# Decode the output
output_text = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

print(output_text)  # Output: "Hola pá"

model.save_pretrained("save_dir")

Uploading to Huggingface

from kaggle_secrets import UserSecretsClient
miToken = UserSecretsClient().get_secret("HF_TOKEN")
from huggingface_hub import login
login(token=miToken)

from huggingface_hub import HfApi
api = HfApi()

# Upload all the content from the local folder to your remote Space.
# By default, files are uploaded at the root of the repo
#api.create_repo(
#    repo_id="meinvirgos/aina-translator-es-ast-onnx",
#    repo_type="model",
#    private=False,
#)
api.upload_folder(
    folder_path="./save_dir",
    repo_id="meinvirgos/aina-translator-es-ast-onnx",
    repo_type="model",
)

meinvirgos
/

aina-translator-es-ast-onnx