--- language: fa library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - cross-encoder - reranker - persian - farsi - xlm-roberta - scientific-qa dataset: - PersianSciQA --- # Cross-Encoder for Persian Scientific Relevance Ranking This is a cross-encoder model based on `xlm-roberta-large` that has been fine-tuned for relevance ranking of Persian scientific texts. It takes a question and a document (an abstract) as input and outputs a score from 0 to 1 indicating their relevance. This model was trained as a reranker for a Persian scientific Question Answering system. ## Model Details - **Base Model:** `xlm-roberta-large` - **Task:** Reranking / Sentence Similarity - **Fine-tuning Framework:** `sentence-transformers` - **Language:** Persian (fa) ## Intended Use The primary use of this model is to act as a **reranker** in a search or question-answering pipeline. Given a user's query and a list of candidate documents retrieved by a faster first-stage model (like BM25 or a bi-encoder), this cross-encoder can re-score the top candidates to provide a more accurate final ranking. ### How to Use To use the model, first install the `sentence-transformers` library: ```bash pip install -U sentence-transformers from sentence_transformers import CrossEncoder # Load the model from the Hugging Face Hub model_name = 'YOUR_HF_USERNAME/reranker-xlm-roberta-large' #<-- IMPORTANT: Replace with your model name! model = CrossEncoder(model_name) # Prepare your query and document pairs query = "روش های ارزیابی در بازیابی اطلاعات چیست؟" # "What are the evaluation methods in information retrieval?" documents = [ "بازیابی اطلاعات یک فرآیند پیچیده است که شامل شاخص گذاری و جستجوی اسناد می شود. ارزیابی آن اغلب با معیارهایی مانند دقت و بازیابی انجام می شود.", # "Information retrieval is a complex process involving indexing and searching documents. Its evaluation is often done with metrics like precision and recall." "یادگیری عمیق در سال های اخیر پیشرفت های چشمگیری در پردازش زبان طبیعی داشته است.", # "Deep learning has made significant progress in natural language processing in recent years." "این مقاله به بررسی روش های جدید برای ارزیابی سیستم های بازیابی اطلاعات معنایی می پردازد و معیارهای نوینی را معرفی می کند." # "This paper examines new methods for evaluating semantic information retrieval systems and introduces novel metrics." ] # Create pairs for scoring sentence_pairs = [[query, doc] for doc in documents] # Predict the scores scores = model.predict(sentence_pairs, convert_to_numpy=True) # Print results for i in range(len(scores)): print(f"Score: {scores[i]:.4f}\t Document: {documents[i]}") # Expected output (scores will vary but should follow this trend): # Score: 0.9123 Document: This paper examines new methods for evaluating semantic information retrieval systems and introduces novel metrics. # Score: 0.7543 Document: Information retrieval is a complex process involving indexing and searching documents. Its evaluation is often done with metrics like precision and recall. # Score: 0.0123 Document: Deep learning has made significant progress in natural language processing in recent years. This model was fine-tuned on the PersianSciQA dataset. Description: PersianSciQA is a large-scale dataset containing 39,809 Persian scientific question-answer pairs. It was generated using a two-stage process with gpt-4o-mini on a corpus of scientific abstracts from IranDoc's 'Ganj' repository. Content: The dataset consists of questions paired with scientific abstracts, primarily from engineering fields. Labels: Each pair has a relevance score from 0 (Not Relevant) to 3 (Highly Relevant), which was normalized to a 0-1 float for training. Training Procedure The model was trained using the provided train_reranker.py script with the following configuration: Epochs: 2 Batch Size: 16 Learning Rate: 2e-5 Loss Function: MSELoss (default for regression in sentence-transformers) Evaluator: CECorrelationEvaluator was used to save the best model based on Spearman's rank correlation on the validation set. Evaluation The PersianSciQA paper reports substantial agreement between the LLM-assigned labels used for training and human expert judgments (Cohen's Kappa of 0.6642). The human validation study confirmed the high quality of the generated questions (88.60% acceptable) and the relevance assessments. Citation If you use this model or the PersianSciQA dataset in your research, please cite the original paper. (Note: The provided paper is a pre-print. Please update the citation information once it is officially published.) @inproceedings{PersianSciQA2025, title={PersianSciQA: A new Dataset for Bridging the Language Gap in Scientific Question Answering}, author={Anonymous}, year={2025}, booktitle={Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP)}, note={Confidential review copy. To be updated upon publication.} } |