Uzbek POS Tagger
This model predicts Universal Dependencies part-of-speech (POS) tags for Uzbek text.
Model details
The model was fine-tuned on a Universal Dependencies treebank containing approximately 600 annotated sentences. It is based on the XLM-RoBERTa base model and adapted for token classification.
Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Arofat/uzbek-pos-tagger")
model = AutoModelForTokenClassification.from_pretrained("Arofat/uzbek-pos-tagger")
# Prepare text
text = "Men O'zbekistonda yashayman."
tokens = text.split()
# Get predictions
inputs = tokenizer(tokens, is_split_into_words=True, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
# Process outputs
predictions = torch.argmax(outputs.logits, dim=2)
id2label = model.config.id2label
# Get POS tags
pos_tags = []
word_ids = inputs.word_ids(batch_index=0)
prev_word_id = None
for idx, word_id in enumerate(word_ids):
if word_id is None or word_id == prev_word_id:
continue
pos_tags.append(id2label[predictions[0, idx].item()])
prev_word_id = word_id
# Print results
for token, tag in zip(tokens, pos_tags):
print(f"{token}: {tag}")
Limitations
This model was trained on a relatively small dataset and may not generalize well to all domains of Uzbek text.
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.