Update README.md

98ef47b verified about 2 months ago

5.32 kB

	---
	language: fa
	library_name: sentence-transformers
	pipeline_tag: sentence-similarity
	tags:
	- cross-encoder
	- reranker
	- persian
	- farsi
	- xlm-roberta
	- scientific-qa
	dataset:
	- PersianSciQA
	---

	# Cross-Encoder for Persian Scientific Relevance Ranking

	This is a cross-encoder model based on `xlm-roberta-large` that has been fine-tuned for relevance ranking of Persian scientific texts. It takes a question and a document (an abstract) as input and outputs a score from 0 to 1 indicating their relevance.

	This model was trained as a reranker for a Persian scientific Question Answering system.

	## Model Details

	- Base Model: `xlm-roberta-large`
	- Task: Reranking / Sentence Similarity
	- Fine-tuning Framework: `sentence-transformers`
	- Language: Persian (fa)

	## Intended Use

	The primary use of this model is to act as a reranker in a search or question-answering pipeline. Given a user's query and a list of candidate documents retrieved by a faster first-stage model (like BM25 or a bi-encoder), this cross-encoder can re-score the top candidates to provide a more accurate final ranking.

	### How to Use

	To use the model, first install the `sentence-transformers` library:
	```bash
	pip install -U sentence-transformers
	from sentence_transformers import CrossEncoder

	# Load the model from the Hugging Face Hub
	model_name = 'YOUR_HF_USERNAME/reranker-xlm-roberta-large' #<-- IMPORTANT: Replace with your model name!
	model = CrossEncoder(model_name)

	# Prepare your query and document pairs
	query = "روش های ارزیابی در بازیابی اطلاعات چیست؟" # "What are the evaluation methods in information retrieval?"
	documents = [
	"بازیابی اطلاعات یک فرآیند پیچیده است که شامل شاخص گذاری و جستجوی اسناد می شود. ارزیابی آن اغلب با معیارهایی مانند دقت و بازیابی انجام می شود.", # "Information retrieval is a complex process involving indexing and searching documents. Its evaluation is often done with metrics like precision and recall."
	"یادگیری عمیق در سال های اخیر پیشرفت های چشمگیری در پردازش زبان طبیعی داشته است.", # "Deep learning has made significant progress in natural language processing in recent years."
	"این مقاله به بررسی روش های جدید برای ارزیابی سیستم های بازیابی اطلاعات معنایی می پردازد و معیارهای نوینی را معرفی می کند." # "This paper examines new methods for evaluating semantic information retrieval systems and introduces novel metrics."
	]

	# Create pairs for scoring
	sentence_pairs = [[query, doc] for doc in documents]

	# Predict the scores
	scores = model.predict(sentence_pairs, convert_to_numpy=True)

	# Print results
	for i in range(len(scores)):
	print(f"Score: {scores[i]:.4f}\t Document: {documents[i]}")

	# Expected output (scores will vary but should follow this trend):
	# Score: 0.9123 Document: This paper examines new methods for evaluating semantic information retrieval systems and introduces novel metrics.
	# Score: 0.7543 Document: Information retrieval is a complex process involving indexing and searching documents. Its evaluation is often done with metrics like precision and recall.
	# Score: 0.0123 Document: Deep learning has made significant progress in natural language processing in recent years.
	This model was fine-tuned on the

	PersianSciQA dataset.




	Description: PersianSciQA is a large-scale dataset containing 39,809 Persian scientific question-answer pairs. It was generated using a two-stage process with



	gpt-4o-mini on a corpus of scientific abstracts from IranDoc's 'Ganj' repository.





	Content: The dataset consists of questions paired with scientific abstracts, primarily from engineering fields.



	Labels: Each pair has a relevance score from 0 (Not Relevant) to 3 (Highly Relevant), which was normalized to a 0-1 float for training.


	Training Procedure
	The model was trained using the provided train_reranker.py script with the following configuration:

	Epochs: 2

	Batch Size: 16

	Learning Rate: 2e-5

	Loss Function: MSELoss (default for regression in sentence-transformers)

	Evaluator: CECorrelationEvaluator was used to save the best model based on Spearman's rank correlation on the validation set.

	Evaluation
	The

	PersianSciQA paper reports substantial agreement between the LLM-assigned labels used for training and human expert judgments (Cohen's Kappa of 0.6642). The human validation study confirmed the high quality of the generated questions (88.60% acceptable) and the relevance assessments.



	Citation
	If you use this model or the PersianSciQA dataset in your research, please cite the original paper.

	(Note: The provided paper is a pre-print. Please update the citation information once it is officially published.)

	@inproceedings{PersianSciQA2025,
	title={PersianSciQA: A new Dataset for Bridging the Language Gap in Scientific Question Answering},
	author={Anonymous},
	year={2025},
	booktitle={Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP)},
	note={Confidential review copy. To be updated upon publication.}
	}