metadata

inference: false
language:
  - bg
license: mit
datasets:
  - oscar
  - chitanka
  - wikipedia
tags:
  - torch

BERT BASE (cased)

Pretrained model on Bulgarian language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is cased: it does make a difference between bulgarian and Bulgarian. The training data is Bulgarian text from OSCAR, Chitanka and Wikipedia.

The model was compressed via progressive module replacing.

How to use

Here is how to use this model in PyTorch:

>>> from transformers import pipeline
>>> 
>>> model = pipeline(
>>>     'fill-mask',
>>>     model='rmihaylov/bert-base-theseus-bg',
>>>     tokenizer='rmihaylov/bert-base-theseus-bg',
>>>     device=0,
>>>     revision=None)
>>>
>>> output = model("Кой ще поеме [MASK] отговорност?")
>>> print(output)