You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

RNAGenesis: A Generalist Foundation Model for Functional RNA Therapeutics

Model Description

RNAGenesis is a generalist RNA foundation model that integrates sequence representation, structural prediction, and de novo functional design within a single generative framework. Trained on diverse clustered non-coding RNAs, RNAGenesis leverages a BERT-style encoder, query-based latent compression, and a diffusion-guided decoder enhanced by inference-time alignment with gradient guidance and beam search strategies.

This model achieves state-of-the-art performance on:

  • 11 of 13 tasks in the BEACON benchmark
  • Inverse folding and 3D structure prediction
  • De novo structure design
  • RNA therapeutics prediction (ASOs, siRNAs, shRNAs, circRNAs, UTR variants)
  • Functional RNA design including aptamers and CRISPR sgRNA scaffolds

Model Details

  • Model Type: Generalist RNA Foundation Model
  • Architecture: BERT-style encoder with query-based latent compression and diffusion-guided decoder
  • Input: RNA sequences (AUGC notation)
  • Output: Sequence embeddings, structure predictions, functional designs
  • Training Data: Diverse clustered non-coding RNAs
  • Key Features:
    • Sequence representation learning
    • Structural prediction capabilities
    • De novo functional design
    • Inference-time alignment with gradient guidance
    • Beam search optimization strategies

Usage

Installation

pip install transformers torch

Basic Usage

from transformers import AutoModel, AutoTokenizer
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/RNAGenesis", trust_remote_code=True)
model = AutoModel.from_pretrained("your-username/RNAGenesis", trust_remote_code=True, torch_dtype=torch.bfloat16)

# Prepare your RNA sequence
rna_sequence = "GCCGGGCAUGGUGGCGCAUGCCUGUAGUCCCAGCUACCCGGGGAGGCUGAGGCAGAAGGAUCACUCGAGCCCAGGAGUUUGAGGUUGCUGUGAGCUAGGCUGACGCCACGGCACUCAGUCUAGCCUGGGCAACAAAGCGAGACUCUGUCUCCA"

# Tokenize and get embeddings
input_ids = torch.tensor(tokenizer.convert_tokens_to_ids(rna_sequence)).unsqueeze(0)
with torch.no_grad():
    outputs = model(input_ids)
    embeddings = outputs.last_hidden_state.mean(dim=1)  # Average pooling

print(f"Embedding shape: {embeddings.shape}")

Advanced Usage - Batch Processing

sequences = [
    "AUGCGAUCGAUCGAUCG",
    "GCGCGCAUAUAUAUAUA",
    "UUUUAAAACCCCGGGGA"
]

# Process multiple sequences
embeddings = []
for seq in sequences:
    input_ids = torch.tensor(tokenizer.convert_tokens_to_ids(seq)).unsqueeze(0)
    with torch.no_grad():
        outputs = model(input_ids)
        seq_embedding = outputs.last_hidden_state.mean(dim=1)
        embeddings.append(seq_embedding)

# Stack embeddings
all_embeddings = torch.cat(embeddings, dim=0)

Performance Highlights

BEACON Benchmark

  • State-of-the-art performance on 11 of 13 tasks
  • Superior performance in structure-aware modeling tasks

RNATx-Bench (RNA Therapeutics Benchmark)

  • Evaluated on >100,000 experimentally validated sequences
  • Strong predictive performance across:
    • Antisense oligonucleotides (ASOs)
    • Small interfering RNAs (siRNAs)
    • Short hairpin RNAs (shRNAs)
    • Circular RNAs (circRNAs)
    • Untranslated region (UTR) variants

Experimental Validation

  • Aptamer Design: IGFBP3-targeting aptamers with KD values as low as 4.02 nM
  • CRISPR Enhancement: Up to 2.5-fold improvement in editing efficiency across:
    • CRISPR-Cas9 systems
    • Base editing systems
    • Prime editing systems

Limitations

  • Maximum sequence length: Depends on model configuration
  • Input must be valid RNA sequences using standard AUGC notation
  • Model performance may vary on sequences significantly different from training data
  • This is a preprint model - results have not been peer-reviewed

Citation

If you use this model in your research, please cite:

@article{zhang2024rnagenesis,
  title={RNAGenesis: A Generalist Foundation Model for Functional RNA Therapeutics},
  author={Zhang, Zaixi and Jin, Ruofan and Chao, Linlin and Xu, Guangxue and Zhang, Yikun and Zhou, Guowei and Yin, Di and Guo, Yingqing and Fu, Yaqi and Yang, Yukang and Huang, Kaixuan and Wang, Xiaotong and Zhang, Junze and Yang, Yujie and Yang, Qirong and Xu, Ziyao and Weinan, E and Zhou, Ruhong and Zhang, Xiaoming and Wang, Mengdi and Cong, Le},
  journal={bioRxiv},
  year={2024},
  doi={10.1101/2024.12.30.630826},
  note={Preprint}
}

Paper: https://doi.org/10.1101/2024.12.30.630826

License

This model is released under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

Access

This model requires approval for access. Please fill out the access request form with:

  • Your intended use case
  • Your affiliation
  • Whether the use is for commercial or research purposes

Authors

Zaixi Zhang, Ruofan Jin, Linlin Chao, Guangxue Xu, Yikun Zhang, Guowei Zhou, Di Yin, Yingqing Guo, Yaqi Fu, Yukang Yang, Kaixuan Huang, Xiaotong Wang, Junze Zhang, Yujie Yang, Qirong Yang, Ziyao Xu, E Weinan, Ruhong Zhou, Xiaoming Zhang, Mengdi Wang, Le Cong

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support