Cicciokr
/

Roberta-Base-Latin-Uncased

Model card Files Files and versions

This model is fine tuned with The Latin Library - 15M Token

The dataset was cleaned:

Removal of all "pseudo-Latin" text ("Lorem ipsum ...").
Use of CLTK for sentence splitting and normalisation.
deduplication of the corpus
lowercase all text

Downloads last month: 2

Safetensors

Model size

124M params

Tensor type

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Cicciokr/Roberta-Base-Latin-Uncased

Base model

ClassCat/roberta-base-latin-v2

Finetuned

(2)

this model

Space using Cicciokr/Roberta-Base-Latin-Uncased 1