File size: 5,490 Bytes
ba55f72
 
 
03c3469
 
 
 
 
 
 
ba55f72
03c3469
 
ba55f72
 
03c3469
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba55f72
03c3469
0c795e8
ba55f72
0c795e8
ba55f72
0c795e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17591aa
 
0c795e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17591aa
0c795e8
f88437e
0c795e8
17591aa
0c795e8
 
 
 
 
 
 
 
ba55f72
 
 
03c3469
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
base_model: unsloth/qwen3-8b-unsloth-bnb-4bit
tags:
- text-generation
- rag
- evaluation
- information-retrieval
- question-answering
- retrieval-augmented-generation
- context-evaluation
- qwen3
- unsloth
- fine-tuned
language:
- en
- multilingual
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
model_type: qwen3
quantized: q8_0
datasets:
- evaluation
- rag-evaluation
metrics:
- completeness
- clarity
- conciseness
- precision
- recall
- mrr
- ndcg
- relevance
widget:
- example_title: "RAG Context Evaluation"
  text: |
    Evaluate the agent's response according to the metrics: completeness, clarity, conciseness, precision, recall, mrr, ndcg, relevance
    
    Question: What are the main benefits of renewable energy?
    Retrieved contexts: [1] Renewable energy sources like solar and wind power provide clean alternatives to fossil fuels, reducing greenhouse gas emissions and air pollution. [2] These energy sources are sustainable and abundant, helping to ensure long-term energy security.
model-index:
- name: RAG Context Evaluator
  results:
  - task:
      type: text-generation
      name: RAG Evaluation
    metrics:
    - type: evaluation_score
      name: Multi-metric Assessment
      value: 0-5
---

# RAG Context Evaluator - Qwen3-8B Fine-tuned πŸš€

## Model Details πŸ“‹

**License:** apache-2.0  
**Finetuned from model:** unsloth/qwen3-8b-unsloth-bnb-4bit  
**Model type:** Text Generation (Specialized for RAG Evaluation)  
**Quantization:** Q8_0

## Model Description 🎯

This model is specifically fine-tuned to evaluate the quality of retrieved contexts in Retrieval-Augmented Generation (RAG) systems. It assesses retrieved passages against user queries using multiple evaluation metrics commonly used in information retrieval and RAG evaluation.

## Intended Uses πŸ’‘

### Primary Use Case 🎯
- **RAG System Evaluation**: Automatically assess the quality of retrieved contexts for question-answering systems
- **Information Retrieval Quality Control**: Evaluate how well retrieved documents match user queries
- **Academic Research**: Support research in information retrieval and RAG system optimization

### Evaluation Metrics πŸ“Š
The model evaluates retrieved contexts using the following metrics:

1. **Completeness** πŸ“ - How thoroughly the retrieved context addresses the query
2. **Clarity** ✨ - How clear and understandable the retrieved information is
3. **Conciseness** πŸŽͺ - How efficiently the information is presented without redundancy
4. **Precision** 🎯 - How accurate and relevant the retrieved information is
5. **Recall** πŸ” - How comprehensive the retrieved information is in covering the query
6. **MRR (Mean Reciprocal Rank)** πŸ“ˆ - Ranking quality of relevant results
7. **NDCG (Normalized Discounted Cumulative Gain)** πŸ“Š - Ranking quality with position consideration
8. **Relevance** πŸ”— - Overall relevance of retrieved contexts to the query

## Training Data πŸ“š

https://huggingface.co/datasets/constehub/rag-evaluation-dataset

### Example Training Instance
```json
{
  "instruction": "Evaluate the agent's response according to the metrics: completeness, clarity, conciseness, precision, recall, mrr, ndcg, relevance",
  "input": {
    "question": "Question about retrieved context",
    "retrieved_contexts": "[Multiple numbered passages with source citations]"
  },
  "output": [
    {
      "name": "completeness",
      "value": 1,
      "comment": "Detailed evaluation comment"
    }
    // ... other metrics
  ]
}
```

## Performance and Limitations ⚑

### Strengths
- Specialized for RAG evaluation
- Multi-dimensional assessment capability
- Detailed explanatory comments for each metric

### Limitations
- **Context Length**: Performance may vary with very long retrieved contexts

## Ethical Considerations 🀝

- The model should be used as a tool to assist human evaluators, not replace human judgment entirely
- Evaluations should be validated by domain experts for critical applications

## Technical Specifications πŸ”§

- **Base Model**: Qwen3-8B
- **Quantization**: Q8_0

## Usage Example πŸ’»

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "mendrika261/rag-evaluator-qwen3-8b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Example evaluation prompt
prompt = """Evaluate the agent's response according to the metrics: completeness, clarity, conciseness, precision, recall, mrr, ndcg, relevance

Question: [Your question here]
Retrieved contexts: [Your retrieved contexts here]"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
evaluation = tokenizer.decode(outputs[0], skip_special_tokens=True)
```

## Citation πŸ“„

If you use this model in your research, please cite:

```bibtex
@misc{constehub-rag-evaluator,
  title={RAG Context Evaluator - Qwen3-8B Fine-tuned},
  author={constehub},
  year={2025},
  howpublished={\url{https://huggingface.co/constehub/rag-evaluation}}
}
```

## Contact πŸ“§

For questions or issues regarding this model, please contact the developer through the Hugging Face model repository.

---

This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)