Model Card for AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA

This model is a version of ParsBERT, fine-tuned for extractive question answering on the Persian language using the PersianQA dataset.

Model Details

Model Description

This is a ParsBERT model fine-tuned on the SajjadAyoubi/persian_qa dataset. It is designed for extractive question answering, meaning it extracts the answer to a question directly from a given context. The fine-tuning process has significantly improved its ability to understand and respond to questions in Persian compared to the base model.

  • Developed by: Amir Mohammad Ebrahiminasab
  • Shared by: Amir Mohammad Ebrahiminasab
  • Model type: bert
  • Language(s) (NLP): fa (Persian)
  • License: MIT
  • Finetuned from model: pedramyazdipoor/parsbert_Youtubeing_PQuAD

Model Sources

Uses

Direct Use

The model can be used for extractive question answering in Persian. You can provide a context and a question, and the model will extract the answer span from the context.

from transformers import pipeline

qa_pipeline = pipeline(
    "question-answering",
    model="AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA",
    tokenizer="AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA"
)

context = "فرهاد مجیدی قادیکلایی مشهور به فرهاد مجیدی بازیکن فوتبال اهل ایران است. او همچنین سابقه بازی در باشگاه استقلال را در کارنامه دارد."
question = "فرهاد مجیدی در چه تیمی سابقه بازی دارد؟"

result = qa_pipeline(question=question, context=context)
# {'score': 0.99..., 'start': 101, 'end': 108, 'answer': 'استقلال'}

print(f"Answer: '{result['answer']}'")

Bias, Risks, and Limitations

The model's performance is directly influenced by the content of the PersianQA dataset. It may not perform as well on contexts from different domains or with different linguistic styles. The model shows a performance drop for answers that are longer than the dataset's average, indicating a potential bias towards extracting shorter text spans.

Recommendations

Users should be aware of the model's limitations, especially its reduced accuracy on longer answer spans. For critical applications, the model's outputs should be verified.

How to Get Started with the Model

Use the code below to get started with the model using PyTorch.

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

tokenizer = AutoTokenizer.from_pretrained("AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA")
model = AutoModelForQuestionAnswering.from_pretrained("AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA")

context = "پایتخت اسپانیا شهر مادرید است."
question = "پایتخت اسپانیا کجاست؟"

inputs = tokenizer(question, context, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

answer_start_index = outputs.start_logits.argmax()
answer_end_index = outputs.end_logits.argmax()

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
answer = tokenizer.decode(predict_answer_tokens)

print(f"Question: {question}")
print(f"Answer: {answer}")
# Answer: مادرید

Training Details

Training Data

The model was fine-tuned on the SajjadAyoubi/persian_qa dataset, which contains question-context-answer triplets in Persian.

Training Procedure

Preprocessing

The training data was preprocessed by tokenizing question and context pairs. Long contexts were handled by creating multiple features for a single example using a sliding window approach (doc_stride). The start and end token positions for the answer were identified in the tokenized input.

Training Hyperparameters

The model was trained with the following hyperparameters:

Argument Value
Learning Rate $2 \times 10^{-5}$
Training Epochs 10
Train Batch Size 8
Evaluation Batch Size 8
Weight Decay 0.01
Scheduler Type Cosine
Warmup Ratio 0.1
Best Model Metric F1-Score

Speeds, Sizes, Times

  • The full fine-tuning process took approximately 1 hour and 22 minutes on a single GPU.

Evaluation

The model was evaluated on the validation split of the SajjadAyoubi/persian_qa dataset.

Testing Data, Factors & Metrics

Testing Data

The evaluation was performed on the validation set of the SajjadAyoubi/persian_qa dataset.

Factors

The model's performance was analyzed based on two factors:

  • Answer Presence: Performance was measured separately for questions that have an answer in the context versus those that do not.
  • Answer Length: Performance was analyzed for answers shorter than the validation set average (22.78 characters) and those longer than the average.

Metrics

  • F1-Score: The primary metric, measuring the harmonic mean of precision and recall on token overlap.
  • Exact Match (EM): The percentage of predictions that perfectly match the ground truth answer.

Results

Summary

Overall Performance on the Validation Set

Model Status Exact Match F1-Score
Fine-Tuned Model (10 Epochs) 55.59% 71.97%

Performance on Data Subsets

Case Type Exact Match F1-Score
Has Answer 44.70% 68.22%
No Answer 78.14% 78.14%
Answer Length Exact Match F1-Score
Longer than Avg. 38.56% 69.80%
Shorter than Avg. 53.01% 68.88%

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator.

  • Hardware Type: T4 GPU
  • Hours used: 1.37
  • Cloud Provider: Google Colab
  • Carbon Emitted: [Not Calculated]

Technical Specifications

Model Architecture and Objective

The model is a BERT-base architecture with a linear layer on top of the hidden-states output for extractive question answering. The objective was to minimize the cross-entropy loss for the start and end token positions of the answer.

Compute Infrastructure

Hardware

The model was trained on a single NVIDIA T4 GPU.

Software

  • transformers
  • torch
  • datasets
  • evaluate

Model Card Authors

Amir Mohammad Ebrahiminasab

Model Card Contact

[email protected]

Downloads last month
86
Safetensors
Model size
162M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA