Distilled-RoBERTa

The DistilBERT model is a RoBERTa model, which is trained on the SQuAD 2.0 training set, fine-tuned on the NewsQA dataset.

Hyperparameters

batch_size = 16
n_epochs = 3
max_seq_len = 512
learning_rate = 2e-5
optimizer=AdamW
lr_schedule = LinearWarmup
weight_decay=0.01
embeds_dropout_prob = 0.1

Downloads last month: 12

Safetensors

Model size

124M params

Tensor type

F32

Inference Providers NEW

Question Answering

This model is not currently available via any of the supported Inference Providers.