--- license: apache-2.0 base_model: ViraIntelligentDataMining/PersianLLaMA-13B library_name: peft tags: - peft - lora - persian - farsi - question-generation - scientific-abstracts - research - nlp language: - fa pipeline_tag: text-generation --- # PersianSciQA-LoRA: Scientific Question Generation for Persian Literature A specialized LoRA adapter that transforms PersianLLaMA-13B into a scientific question generation system for Persian academic abstracts. ## Academic Overview **PersianSciQA-LoRA** addresses the gap in Persian language processing for academic question generation. This adapter achieves specialized performance in generating relevant questions from Persian scientific abstracts across multiple domains. ### Research Contributions - First specialized Persian question generation model for scientific literature - Efficient fine-tuning approach using LoRA methodology - Cross-domain validation across medical, engineering, and computer science abstracts - Significant performance improvement with minimal computational overhead ## Model Specifications | Parameter | Value | |-----------|-------| | **Base Model** | PersianLLaMA-13B (13 billion parameters) | | **Adaptation Method** | LoRA (Low-Rank Adaptation) | | **LoRA Rank (r)** | 32 | | **LoRA Alpha** | 64 | | **Trainable Parameters** | ~67M (0.5% of base model) | | **Target Modules** | Query, Key, Value, Output, Gate, Up, Down projections | | **Training Language** | Persian/Farsi | | **Domain** | Scientific Literature | ## Training Methodology ### Dataset - **Source**: Curated Persian scientific abstracts - **Quality Filter**: Relevance scores 2-3 (high quality) - **Domains**: Medical, Engineering, Computer Science, Physics - **Size**: 18,740 high-quality abstract-question pairs ### Training Configuration - **Learning Rate**: 2e-5 with cosine scheduling - **Batch Size**: Effective batch size of 8 (accumulated) - **Epochs**: 3 with early stopping - **Precision**: Mixed precision (BF16) - **Hardware**: RTX A6000 (48GB VRAM) ### Performance Metrics - **Training Loss Reduction**: >30% improvement - **Validation Stability**: Consistent convergence - **Generation Quality**: Coherent, contextually relevant questions ## Usage ### Installation ```bash pip install transformers peft torch ``` ### Basic Usage ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Load base model and adapter base_model = AutoModelForCausalLM.from_pretrained( "ViraIntelligentDataMining/PersianLLaMA-13B", torch_dtype=torch.bfloat16, device_map="auto" ) model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/PersianSciQA-LoRA") tokenizer = AutoTokenizer.from_pretrained("ViraIntelligentDataMining/PersianLLaMA-13B") # Generate scientific question abstract = "Your Persian scientific abstract here" prompt = f"چکیده: {abstract}\nسوال:" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=50, do_sample=True, temperature=0.7, top_p=0.9, repetition_penalty=1.1, pad_token_id=tokenizer.pad_token_id ) question = tokenizer.decode(outputs[0, inputs.input_ids.shape[1]:], skip_special_tokens=True) print(f"Generated Question: {question}") ``` ## Evaluation Results ### Qualitative Assessment - **Relevance**: Generated questions are contextually appropriate - **Fluency**: Natural Persian language structure - **Complexity**: Appropriate difficulty level for academic content - **Diversity**: Varied question types ### Training Efficiency - **Convergence**: Achieved stable training within 3 epochs - **Memory Efficiency**: 100MB adapter vs 26GB full model - **Training Time**: ~4 hours on RTX A6000 ## Research Applications ### Academic Use Cases 1. **Educational Assessment**: Automatic question generation for Persian scientific courses 2. **Literature Review**: Question formulation for systematic reviews 3. **Research Methodology**: Hypothesis generation from existing literature 4. **Language Technology**: Advancing Persian NLP capabilities ### Technical Advantages - **Domain Adaptation**: Specialized for scientific vocabulary - **Efficiency**: Minimal computational requirements - **Transferability**: Compatible with standard PEFT infrastructure - **Scalability**: Easy integration into larger NLP pipelines ## Citation For academic use, please cite: ```bibtex @misc{persiansciqa-lora-2025, title={PersianSciQA-LoRA: Scientific Question Generation for Persian Literature}, author={[Your Name]}, year={2025}, url={https://huggingface.co/YOUR_USERNAME/PersianSciQA-LoRA}, note={LoRA adapter for Persian scientific question generation based on PersianLLaMA-13B} } ``` ## License Released under Apache 2.0 License. Academic and research use encouraged. ## Research Collaboration We welcome collaboration from Persian language researchers, educational technology developers, and NLP researchers focusing on low-resource languages. --- *Advancing Persian Academic NLP Through Efficient Fine-tuning*