--- base_model: unsloth/qwen3-8b-unsloth-bnb-4bit tags: - text-generation - rag - evaluation - information-retrieval - question-answering - retrieval-augmented-generation - context-evaluation - qwen3 - unsloth - fine-tuned language: - en - multilingual license: apache-2.0 library_name: transformers pipeline_tag: text-generation model_type: qwen3 quantized: q8_0 datasets: - evaluation - rag-evaluation metrics: - completeness - clarity - conciseness - precision - recall - mrr - ndcg - relevance widget: - example_title: "RAG Context Evaluation" text: | Evaluate the agent's response according to the metrics: completeness, clarity, conciseness, precision, recall, mrr, ndcg, relevance Question: What are the main benefits of renewable energy? Retrieved contexts: [1] Renewable energy sources like solar and wind power provide clean alternatives to fossil fuels, reducing greenhouse gas emissions and air pollution. [2] These energy sources are sustainable and abundant, helping to ensure long-term energy security. model-index: - name: RAG Context Evaluator results: - task: type: text-generation name: RAG Evaluation metrics: - type: evaluation_score name: Multi-metric Assessment value: 0-5 --- # RAG Context Evaluator - Qwen3-8B Fine-tuned 🚀 ## Model Details 📋 **License:** apache-2.0 **Finetuned from model:** unsloth/qwen3-8b-unsloth-bnb-4bit **Model type:** Text Generation (Specialized for RAG Evaluation) **Quantization:** Q8_0 ## Model Description 🎯 This model is specifically fine-tuned to evaluate the quality of retrieved contexts in Retrieval-Augmented Generation (RAG) systems. It assesses retrieved passages against user queries using multiple evaluation metrics commonly used in information retrieval and RAG evaluation. ## Intended Uses 💡 ### Primary Use Case 🎯 - **RAG System Evaluation**: Automatically assess the quality of retrieved contexts for question-answering systems - **Information Retrieval Quality Control**: Evaluate how well retrieved documents match user queries - **Academic Research**: Support research in information retrieval and RAG system optimization ### Evaluation Metrics 📊 The model evaluates retrieved contexts using the following metrics: 1. **Completeness** 📝 - How thoroughly the retrieved context addresses the query 2. **Clarity** ✨ - How clear and understandable the retrieved information is 3. **Conciseness** 🎪 - How efficiently the information is presented without redundancy 4. **Precision** 🎯 - How accurate and relevant the retrieved information is 5. **Recall** 🔍 - How comprehensive the retrieved information is in covering the query 6. **MRR (Mean Reciprocal Rank)** 📈 - Ranking quality of relevant results 7. **NDCG (Normalized Discounted Cumulative Gain)** 📊 - Ranking quality with position consideration 8. **Relevance** 🔗 - Overall relevance of retrieved contexts to the query ## Training Data 📚 ### Example Training Instance ```json { "instruction": "Evaluate the agent's response according to the metrics: completeness, clarity, conciseness, precision, recall, mrr, ndcg, relevance", "input": { "question": "Question about retrieved context", "retrieved_contexts": "[Multiple numbered passages with source citations]" }, "output": [ { "name": "completeness", "value": 1, "comment": "Detailed evaluation comment" } // ... other metrics ] } ``` ## Performance and Limitations ⚡ ### Strengths - Specialized for RAG evaluation - Multi-dimensional assessment capability - Detailed explanatory comments for each metric ### Limitations - **Context Length**: Performance may vary with very long retrieved contexts ## Ethical Considerations 🤝 - The model should be used as a tool to assist human evaluators, not replace human judgment entirely - Evaluations should be validated by domain experts for critical applications ## Technical Specifications 🔧 - **Base Model**: Qwen3-8B - **Quantization**: Q8_0 ## Usage Example 💻 ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "mendrika261/rag-evaluator-qwen3-8b" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Example evaluation prompt prompt = """Evaluate the agent's response according to the metrics: completeness, clarity, conciseness, precision, recall, mrr, ndcg, relevance Question: [Your question here] Retrieved contexts: [Your retrieved contexts here]""" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs) evaluation = tokenizer.decode(outputs[0], skip_special_tokens=True) ``` ## Citation 📄 If you use this model in your research, please cite: ```bibtex @misc{mendrika261-rag-evaluator, title={RAG Context Evaluator - Qwen3-8B Fine-tuned}, author={mendrika261}, year={2025}, howpublished={\url{https://huggingface.co/mendrika261/rag-evaluation}} } ``` ## Contact 📧 For questions or issues regarding this model, please contact the developer through the Hugging Face model repository. --- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)