ProtT5-XL-UniRef50 Encoder (ONNX, Half-Precision)

An optimized ONNX version of the encoder-only, half-precision ProtT5-XL-UniRef50 model for efficient protein embeddings.

This is an ONNX-converted version of Rostlab/prot_t5_xl_half_uniref50-enc, optimized for production inference.

Model Description

ProtT5-XL-UniRef50 is based on the T5-3B model and was pretrained on a large corpus of protein sequences in a self-supervised fashion. This ONNX version contains only the encoder portion using half precision (float16), enabling efficient protein/amino acid representation generation.

Key Features:

🚀 Optimized Inference: ONNX Runtime for more flexible inference deployment
💾 Reduced Memory: Half-precision with external weights (2.25GB total)
⚡ Hardware Agnostic: Supports CPU, GPU, and specialized accelerators

Conversion Details

This model was converted from the original PyTorch model with the following optimizations:

ONNX Opset: Version 14 for broad compatibility
Precision: FP16 (half-precision) maintained from original
External Weights: Large weight matrices stored separately for efficient loading
Dynamic Shapes: Supports variable batch sizes and sequence lengths

Performance Metrics

Accuracy Validation (PyTorch vs ONNX):

Average Cosine Similarity: 1.000244 (near-perfect preservation)
Maximum Absolute Difference: 0.001953 (minimal numerical differences)
All test cases: Cosine similarity > 0.998

Inference Performance (example results):

Short sequences (7-20 AAs): ~750ms
Long sequences (140+ AAs): ~1.1s
Performance varies by hardware and ONNX Runtime providers

Performance Notes:

Apple Silicon (M4 Macs): PyTorch with MPS backend typically outperforms ONNX CPU inference due to optimized GPU acceleration. Use PyTorch for local M4 development.
NVIDIA GPUs: ONNX Runtime with CUDA provider is often competitive with or faster than PyTorch for inference, thanks to aggressive graph optimizations and inference-specific tuning.
Deployment: ONNX provides consistent cross-platform performance and easier production deployment regardless of hardware.

Getting the Model

System Requirements

pip install onnxruntime transformers

Option 1: Hugging Face CLI (Recommended)

First install huggingface-cli.

# Download the model
huggingface-cli download Rostlab/prot-t5-xl-uniref50-enc-onnx --local-dir ./prot_t5_onnx

Option 2: Git LFS

# Clone the repository (requires Git LFS for large files)
git lfs install
git clone https://huggingface.co/Rostlab/prot-t5-xl-uniref50-enc-onnx
cd prot-t5-xl-uniref50-enc-onnx

Usage

ONNX Runtime (Recommended)

import onnxruntime as ort
import numpy as np
from transformers import T5Tokenizer
import re

# Load tokenizer from local directory (after download)
tokenizer = T5Tokenizer.from_pretrained("./", do_lower_case=False, legacy=False)

# Load ONNX model
session = ort.InferenceSession("model.onnx")

# Example sequences
sequences = ["PRTEINO", "SEQWENCE"]

# Preprocess: replace rare amino acids and add spaces
sequences = [" ".join(list(re.sub(r"[UZOB]", "X", seq))) for seq in sequences]

# Tokenize
inputs = tokenizer(sequences, return_tensors="np", padding=True, truncation=False)

# Run inference
outputs = session.run(
    None, 
    {
        "input_ids": inputs["input_ids"].astype(np.int64),
        "attention_mask": inputs["attention_mask"].astype(np.int64)
    }
)

embeddings = outputs[0]  # Shape: [batch_size, seq_len, 1024]

# Extract per-residue embeddings (removing special tokens)
emb_0 = embeddings[0, 1:len(sequences[0].split())+1]  # First sequence
emb_1 = embeddings[1, 1:len(sequences[1].split())+1]  # Second sequence

# Per-protein embeddings (mean pooling)
protein_emb_0 = np.mean(emb_0, axis=0)  # Shape: [1024]
protein_emb_1 = np.mean(emb_1, axis=0)  # Shape: [1024]

print(f"Protein 1 embedding shape: {protein_emb_0.shape}")
print(f"Protein 2 embedding shape: {protein_emb_1.shape}")

Model Architecture

Base Model: T5-3B Encoder
Hidden Size: 1024
Layers: 24
Attention Heads: 16
Parameters: ~1.2B (encoder only)
Precision: FP16 (half-precision)
Input: Tokenized amino acid sequences
Output: Dense embeddings (1024-dimensional)

Training Data

Pretrained on UniRef50 containing protein sequences with a BART-like MLM denoising objective:

Masking: 15% of amino acids randomly masked
Vocabulary: 20 standard amino acids + special tokens
Preprocessing: Rare amino acids (U/Z/O/B) replaced with X
Format: Space-separated amino acids (required for T5 tokenizer)

Intended Use

Primary Use Cases

Protein Embeddings: Generate dense vector representations of proteins
Feature Extraction: Create features for downstream ML models
Similarity Analysis: Compute protein sequence similarities
Protein Classification: As feature extractor for classification tasks
Production Inference: High-throughput protein processing
Model Deployment: Optimized inference for serving applications

Limitations

Uppercase Only: Requires uppercase amino acid sequences
Memory Requirements: ~3GB total (model + weights)
Sequence Length: Optimal for sequences up to 1024 amino acids
Domain: Limited to natural protein sequences

Repository Contents

Model Files

├── model.onnx                    # Main ONNX model (573KB)
├── shared.weight                 # Shared parameters (256KB)
├── onnx__MatMul_[0-143]          # External weight matrices (2.25GB total)
├── spiece.model                  # SentencePiece tokenizer (238KB)
├── tokenizer_config.json         # Tokenizer configuration
├── special_tokens_map.json       # Special tokens mapping
└── added_tokens.json             # Additional tokens

Scripts

├── convert.py                    # ONNX conversion script
└── test_onnx.py                  # Model validation and testing

Conversion & Testing Scripts

convert.py

Script to convert ProtT5 encoder models to ONNX format:

# Convert the original model to ONNX
python convert.py --model_name Rostlab/prot_t5_xl_half_uniref50-enc --output_dir ./prot_t5_onnx

# Options:
# --model_name: Hugging Face model identifier
# --output_dir: Directory to save converted model
# --max_sequence_length: Maximum sequence length (default: 1024)
# --fp16: Use half precision (default: True)
# --no_fp16: Disable half precision

Features:

Converts to ONNX opset 14 with dynamic shapes
Preserves half-precision (FP16) from original model
Exports external weights for large matrices
Includes tokenizer files for complete package

test_onnx.py

Script to validate ONNX model accuracy and performance:

# Test the converted model
python test_onnx.py --onnx_dir ./prot_t5_onnx

# Test with custom sequences
python test_onnx.py --onnx_dir ./prot_t5_onnx --sequences "MKFVPKX" "ACDEFG"

# Options:
# --onnx_dir: Directory containing ONNX model and tokenizer
# --original_model: Original PyTorch model for comparison
# --sequences: Custom protein sequences to test

Validation Features:

Accuracy Testing: Compares PyTorch vs ONNX outputs
Performance Benchmarking: Measures inference speed
Cosine Similarity: Validates embedding preservation (>99.8%)
Rare Amino Acids: Tests U/Z/O/B → X replacement
Variable Lengths: Tests sequences from short to long

Example Output:

PROTТ5 ONNX TEST RESULTS
============================================================
Accuracy Tests: ✅ PASS
  Average Cosine Similarity: 1.000244
  Maximum Absolute Difference: 0.001953

Performance Results:
  PyTorch Average Time: 0.2140s
  ONNX Average Time: 0.1580s
  Speedup: 1.35x
============================================================

Conversion Process

This ONNX model was converted using the following process:

Base Model: Rostlab/prot_t5_xl_half_uniref50-enc
Conversion Tool: PyTorch → ONNX with optimizations
Validation: Comprehensive accuracy testing vs original PyTorch model
Optimization: External weights, dynamic shapes, FP16 precision

Citation

@article{Elnaggar2020.07.12.199554,
    author = {Elnaggar, Ahmed and Heinzinger, Michael and Dallago, Christian and Rehawi, Ghalia and Wang, Yu and Jones, Llion and Gibbs, Tom and Feher, Tamas and Angerer, Christoph and Steinegger, Martin and BHOWMIK, DEBSINDHU and Rost, Burkhard},
    title = {ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing},
    elocation-id = {2020.07.12.199554},
    year = {2020},
    doi = {10.1101/2020.07.12.199554},
    publisher = {Cold Spring Harbor Laboratory},
    journal = {bioRxiv}
}

License

MIT License

Related Models

Original PyTorch: Rostlab/prot_t5_xl_half_uniref50-enc
Full Precision: Rostlab/prot_t5_xl_uniref50
Other ProtTrans Models: ProtTrans Collection

Contact

For questions about this ONNX conversion or deployment, please open an issue on the model repository.

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

PyTorch-ONNX Similarity
self-reported

1.000
Maximum Absolute Difference
self-reported

0.002

Metadata error: specify a dataset to view leaderboard