Model Card for AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA
This model is a version of ParsBERT, fine-tuned for extractive question answering on the Persian language using the PersianQA dataset.
Model Details
Model Description
This is a ParsBERT model fine-tuned on the SajjadAyoubi/persian_qa
dataset. It is designed for extractive question answering, meaning it extracts the answer to a question directly from a given context. The fine-tuning process has significantly improved its ability to understand and respond to questions in Persian compared to the base model.
- Developed by: Amir Mohammad Ebrahiminasab
- Shared by: Amir Mohammad Ebrahiminasab
- Model type: bert
- Language(s) (NLP): fa (Persian)
- License: MIT
- Finetuned from model:
pedramyazdipoor/parsbert_Youtubeing_PQuAD
Model Sources
- Repository: https://huggingface.co/AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA
- Demo: https://huggingface.co/spaces/AmoooEBI/ParsBert-QA-Chatbot
Uses
Direct Use
The model can be used for extractive question answering in Persian. You can provide a context and a question, and the model will extract the answer span from the context.
from transformers import pipeline
qa_pipeline = pipeline(
"question-answering",
model="AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA",
tokenizer="AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA"
)
context = "فرهاد مجیدی قادیکلایی مشهور به فرهاد مجیدی بازیکن فوتبال اهل ایران است. او همچنین سابقه بازی در باشگاه استقلال را در کارنامه دارد."
question = "فرهاد مجیدی در چه تیمی سابقه بازی دارد؟"
result = qa_pipeline(question=question, context=context)
# {'score': 0.99..., 'start': 101, 'end': 108, 'answer': 'استقلال'}
print(f"Answer: '{result['answer']}'")
Bias, Risks, and Limitations
The model's performance is directly influenced by the content of the PersianQA dataset. It may not perform as well on contexts from different domains or with different linguistic styles. The model shows a performance drop for answers that are longer than the dataset's average, indicating a potential bias towards extracting shorter text spans.
Recommendations
Users should be aware of the model's limitations, especially its reduced accuracy on longer answer spans. For critical applications, the model's outputs should be verified.
How to Get Started with the Model
Use the code below to get started with the model using PyTorch.
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch
tokenizer = AutoTokenizer.from_pretrained("AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA")
model = AutoModelForQuestionAnswering.from_pretrained("AmoooEBI/Bert-fa-qa-finetuned-on-PersianQA")
context = "پایتخت اسپانیا شهر مادرید است."
question = "پایتخت اسپانیا کجاست؟"
inputs = tokenizer(question, context, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
answer_start_index = outputs.start_logits.argmax()
answer_end_index = outputs.end_logits.argmax()
predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
answer = tokenizer.decode(predict_answer_tokens)
print(f"Question: {question}")
print(f"Answer: {answer}")
# Answer: مادرید
Training Details
Training Data
The model was fine-tuned on the SajjadAyoubi/persian_qa
dataset, which contains question-context-answer triplets in Persian.
Training Procedure
Preprocessing
The training data was preprocessed by tokenizing question and context pairs. Long contexts were handled by creating multiple features for a single example using a sliding window approach (doc_stride
). The start and end token positions for the answer were identified in the tokenized input.
Training Hyperparameters
The model was trained with the following hyperparameters:
Argument | Value |
---|---|
Learning Rate | $2 \times 10^{-5}$ |
Training Epochs | 10 |
Train Batch Size | 8 |
Evaluation Batch Size | 8 |
Weight Decay | 0.01 |
Scheduler Type | Cosine |
Warmup Ratio | 0.1 |
Best Model Metric | F1-Score |
Speeds, Sizes, Times
- The full fine-tuning process took approximately 1 hour and 22 minutes on a single GPU.
Evaluation
The model was evaluated on the validation split of the SajjadAyoubi/persian_qa
dataset.
Testing Data, Factors & Metrics
Testing Data
The evaluation was performed on the validation set of the SajjadAyoubi/persian_qa
dataset.
Factors
The model's performance was analyzed based on two factors:
- Answer Presence: Performance was measured separately for questions that have an answer in the context versus those that do not.
- Answer Length: Performance was analyzed for answers shorter than the validation set average (22.78 characters) and those longer than the average.
Metrics
- F1-Score: The primary metric, measuring the harmonic mean of precision and recall on token overlap.
- Exact Match (EM): The percentage of predictions that perfectly match the ground truth answer.
Results
Summary
Overall Performance on the Validation Set
Model Status | Exact Match | F1-Score |
---|---|---|
Fine-Tuned Model (10 Epochs) | 55.59% | 71.97% |
Performance on Data Subsets
Case Type | Exact Match | F1-Score |
---|---|---|
Has Answer | 44.70% | 68.22% |
No Answer | 78.14% | 78.14% |
Answer Length | Exact Match | F1-Score |
---|---|---|
Longer than Avg. | 38.56% | 69.80% |
Shorter than Avg. | 53.01% | 68.88% |
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator.
- Hardware Type: T4 GPU
- Hours used: 1.37
- Cloud Provider: Google Colab
- Carbon Emitted: [Not Calculated]
Technical Specifications
Model Architecture and Objective
The model is a BERT-base architecture with a linear layer on top of the hidden-states output for extractive question answering. The objective was to minimize the cross-entropy loss for the start and end token positions of the answer.
Compute Infrastructure
Hardware
The model was trained on a single NVIDIA T4 GPU.
Software
transformers
torch
datasets
evaluate
Model Card Authors
Amir Mohammad Ebrahiminasab
Model Card Contact
- Downloads last month
- 86