TinyDNABERT is a lightweight genomic language model built from scratch, employing a BPE tokenizer and a RoBERTa architecture. It is pre-trained on the human reference genome GRCh38.p14 and evaluated using the NT Benchmark. Training is performed using only two NVIDIA RTX 4090 GPUs.

For more details, please refer to the TinyDNABERT repository.

Downloads last month
6
Safetensors
Model size
20M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ChengsenWang/TinyDNABERT-20M-V1

Collection including ChengsenWang/TinyDNABERT-20M-V1