sahajBERT Named Entity Recognition

Model description

sahajBERT fine-tuned for NER using the bengali split of WikiANN .

Named Entities predicted by the model:

Label id Label
0 O
1 B-PER
2 I-PER
3 B-ORG
4 I-ORG
5 B-LOC
6 I-LOC

Intended uses & limitations

How to use

You can use this model directly with a pipeline for masked language modeling:

from transformers import AlbertForTokenClassification, TokenClassificationPipeline, PreTrainedTokenizerFast

# Initialize tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NER")

# Initialize model
model = AlbertForTokenClassification.from_pretrained("neuropark/sahajBERT-NER")

# Initialize pipeline
pipeline = TokenClassificationPipeline(tokenizer=tokenizer, model=model)

raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
output = pipeline(raw_text)

Limitations and bias

WIP

Training data

The model was initialized it with pre-trained weights of sahajBERT at step 19519 and trained on the bengali of WikiANN

Training procedure

Coming soon!

Eval results

loss: 0.11714419722557068

accuracy: 0.9772286821705426

precision: 0.9585365853658536

recall: 0.9651277013752456

f1 : 0.9618208516886931

BibTeX entry and citation info

Coming soon!

Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train SaulLu/recreate-history