---
base_model:
- FacebookAI/xlm-roberta-base
pipeline_tag: text-classification
---
# 🧠 Model Name (ONNX Version)

This repository contains the ONNX-exported version of the [`keeper-security/xlm_base_8000`](https://huggingface.co/keeper-security/xlm_base_8000) model, optimized for fast inference with [ONNX Runtime](https://onnxruntime.ai/).

---

## 🚀 Quickstart

### 1. Install dependencies

```bash
pip install huggingface_hub onnxruntime transformers
```

## 2. Load the ONNX model and tokenizer

```python
from huggingface_hub import hf_hub_download
import onnxruntime as ort
from transformers import AutoTokenizer

# Download the ONNX model from the Hub
model_path = hf_hub_download(
    repo_id="keeper-security/xlm_base_8000_onnx_static_int8",  
    filename="model_quantized.onnx"
)

# Load the ONNX model
session = ort.InferenceSession(model_path)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("keeper-security/xlm_base_8000")
```
## 📥 Inputs & Outputs

This model expects tokenized inputs with:

- `input_ids`
- `attention_mask`

### 🧪 Example Inference

```python
import numpy as np
inputs = tokenizer("[URL]: https://signin.example.edu [HTML]: <input type='password' name='passwd'>", return_tensors="np")

# Run inference
outputs = session.run(None, {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"]
})

# Example: extract logits
logits = outputs[0]
pred_classes = np.argmax(logits, axis=-1)
pred_classes
```


## 🏷️ Mapping Model Output to Labels

This inference code will return an `int` between `0` and `44`, representing one of the 45 possible classes.

To convert the output index to a human-readable label, use the following mapping:

```python
unique_labels = [
  "ACCOUNT_CREATION_PASSWORD",
  "ADDRESS_CITY",
  "ADDRESS_COUNTRY",
  "ADDRESS_LINE1",
  "ADDRESS_LINE2",
  "ADDRESS_STATE",
  "ADDRESS_ZIP",
  "ALTERNATIVE_FAMILY_NAME",
  "ALTERNATIVE_FULL_NAME",
  "ALTERNATIVE_GIVEN_NAME",
  "AMBIGUOUS",
  "BIRTH_DATE_DAY",
  "BIRTH_DATE_MONTH",
  "BIRTH_DATE_YEAR",
  "COMPANY_NAME",
  "CONFIRMATION_PASSWORD",
  "CREDIT_CARD_EXP_DATE_MONTH_AND_YEAR",
  "CREDIT_CARD_EXP_DATE_YEAR",
  "CREDIT_CARD_EXP_MONTH",
  "CREDIT_CARD_NUMBER",
  "CREDIT_CARD_STANDALONE_VERIFICATION_CODE",
  "CREDIT_CARD_TYPE",
  "CREDIT_CARD_VERIFICATION_CODE",
  "EMAIL_ADDRESS",
  "IBAN_VALUE",
  "MALICIOUS_LABEL",
  "MERCHANT_EMAIL_SIGNUP",
  "MERCHANT_PROMO_CODE",
  "NAME_FIRST",
  "NAME_FULL",
  "NAME_LAST",
  "NAME_MIDDLE",
  "NAME_MIDDLE_INITIAL",
  "NAME_PREFIX",
  "NAME_SUFFIX",
  "NATIONAL_IDENTITY_NUMBER",
  "NEW_PASSWORD",
  "PASSWORD",
  "PHONE_NUMBER",
  "PIN_CODE",
  "PROBABLY_NEW_PASSWORD",
  "SEARCH",
  "TWO_FACTOR_CODE",
  "UNKNOWN",
  "USERNAME"
]

label2id = {label: i for i, label in enumerate(unique_labels)}
id2label = {i: label for label, i in label2id.items()}

# Example usage:
predicted_class_index = int(logits.argmax())  
predicted_label = id2label[predicted_class_index]
print(predicted_label)
```

This will return a string label like "EMAIL_ADDRESS" or "PASSWORD" corresponding to the model's prediction.