--- base_model: - FacebookAI/xlm-roberta-base pipeline_tag: text-classification --- # ๐Ÿง  Model Name (ONNX Version) This repository contains the ONNX-exported version of the [`keeper-security/xlm_base_8000`](https://huggingface.co/keeper-security/xlm_base_8000) model, optimized for fast inference with [ONNX Runtime](https://onnxruntime.ai/). --- ## ๐Ÿš€ Quickstart ### 1. Install dependencies ```bash pip install huggingface_hub onnxruntime transformers ``` ## 2. Load the ONNX model and tokenizer ```python from huggingface_hub import hf_hub_download import onnxruntime as ort from transformers import AutoTokenizer # Download the ONNX model from the Hub model_path = hf_hub_download( repo_id="keeper-security/xlm_base_8000_onnx_static_int8", filename="model_quantized.onnx" ) # Load the ONNX model session = ort.InferenceSession(model_path) # Load the tokenizer tokenizer = AutoTokenizer.from_pretrained("keeper-security/xlm_base_8000") ``` ## ๐Ÿ“ฅ Inputs & Outputs This model expects tokenized inputs with: - `input_ids` - `attention_mask` ### ๐Ÿงช Example Inference ```python import numpy as np inputs = tokenizer("[URL]: https://signin.example.edu [HTML]: ", return_tensors="np") # Run inference outputs = session.run(None, { "input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"] }) # Example: extract logits logits = outputs[0] pred_classes = np.argmax(logits, axis=-1) pred_classes ``` ## ๐Ÿท๏ธ Mapping Model Output to Labels This inference code will return an `int` between `0` and `44`, representing one of the 45 possible classes. To convert the output index to a human-readable label, use the following mapping: ```python unique_labels = [ "ACCOUNT_CREATION_PASSWORD", "ADDRESS_CITY", "ADDRESS_COUNTRY", "ADDRESS_LINE1", "ADDRESS_LINE2", "ADDRESS_STATE", "ADDRESS_ZIP", "ALTERNATIVE_FAMILY_NAME", "ALTERNATIVE_FULL_NAME", "ALTERNATIVE_GIVEN_NAME", "AMBIGUOUS", "BIRTH_DATE_DAY", "BIRTH_DATE_MONTH", "BIRTH_DATE_YEAR", "COMPANY_NAME", "CONFIRMATION_PASSWORD", "CREDIT_CARD_EXP_DATE_MONTH_AND_YEAR", "CREDIT_CARD_EXP_DATE_YEAR", "CREDIT_CARD_EXP_MONTH", "CREDIT_CARD_NUMBER", "CREDIT_CARD_STANDALONE_VERIFICATION_CODE", "CREDIT_CARD_TYPE", "CREDIT_CARD_VERIFICATION_CODE", "EMAIL_ADDRESS", "IBAN_VALUE", "MALICIOUS_LABEL", "MERCHANT_EMAIL_SIGNUP", "MERCHANT_PROMO_CODE", "NAME_FIRST", "NAME_FULL", "NAME_LAST", "NAME_MIDDLE", "NAME_MIDDLE_INITIAL", "NAME_PREFIX", "NAME_SUFFIX", "NATIONAL_IDENTITY_NUMBER", "NEW_PASSWORD", "PASSWORD", "PHONE_NUMBER", "PIN_CODE", "PROBABLY_NEW_PASSWORD", "SEARCH", "TWO_FACTOR_CODE", "UNKNOWN", "USERNAME" ] label2id = {label: i for i, label in enumerate(unique_labels)} id2label = {i: label for label, i in label2id.items()} # Example usage: predicted_class_index = int(logits.argmax()) predicted_label = id2label[predicted_class_index] print(predicted_label) ``` This will return a string label like "EMAIL_ADDRESS" or "PASSWORD" corresponding to the model's prediction.