TinyDNABERT is a lightweight genomic language model built from scratch, employing a BPE tokenizer and a RoBERTa architecture. It is pre-trained on the human reference genome GRCh38.p14 and evaluated using the NT Benchmark. Training is performed using only two NVIDIA RTX 4090 GPUs.

For more details, please refer to the TinyDNABERT repository.

Downloads last month: 6

Safetensors

Model size

20M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ChengsenWang/TinyDNABERT-20M-V1

Collection including ChengsenWang/TinyDNABERT-20M-V1

TinyDNABERT

Collection

2 items • Updated about 19 hours ago