mendrika261 commited on
Commit
0c795e8
Β·
verified Β·
1 Parent(s): 1dbc7c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -4
README.md CHANGED
@@ -10,12 +10,117 @@ license: apache-2.0
10
  language:
11
  - en
12
  ---
 
13
 
14
- # Uploaded model
15
 
16
- - **Developed by:** mendrika261
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/qwen3-8b-unsloth-bnb-4bit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
 
10
  language:
11
  - en
12
  ---
13
+ # RAG Context Evaluator - Qwen3-8B Fine-tuned πŸš€
14
 
15
+ ## Model Details πŸ“‹
16
 
17
+ **License:** apache-2.0
18
+ **Finetuned from model:** unsloth/qwen3-8b-unsloth-bnb-4bit
19
+ **Model type:** Text Generation (Specialized for RAG Evaluation)
20
+ **Quantization:** Q8_0
21
+
22
+ ## Model Description 🎯
23
+
24
+ This model is specifically fine-tuned to evaluate the quality of retrieved contexts in Retrieval-Augmented Generation (RAG) systems. It assesses retrieved passages against user queries using multiple evaluation metrics commonly used in information retrieval and RAG evaluation.
25
+
26
+ ## Intended Uses πŸ’‘
27
+
28
+ ### Primary Use Case 🎯
29
+ - **RAG System Evaluation**: Automatically assess the quality of retrieved contexts for question-answering systems
30
+ - **Information Retrieval Quality Control**: Evaluate how well retrieved documents match user queries
31
+ - **Academic Research**: Support research in information retrieval and RAG system optimization
32
+
33
+ ### Evaluation Metrics πŸ“Š
34
+ The model evaluates retrieved contexts using the following metrics:
35
+
36
+ 1. **Completeness** πŸ“ - How thoroughly the retrieved context addresses the query
37
+ 2. **Clarity** ✨ - How clear and understandable the retrieved information is
38
+ 3. **Conciseness** πŸŽͺ - How efficiently the information is presented without redundancy
39
+ 4. **Precision** 🎯 - How accurate and relevant the retrieved information is
40
+ 5. **Recall** πŸ” - How comprehensive the retrieved information is in covering the query
41
+ 6. **MRR (Mean Reciprocal Rank)** πŸ“ˆ - Ranking quality of relevant results
42
+ 7. **NDCG (Normalized Discounted Cumulative Gain)** πŸ“Š - Ranking quality with position consideration
43
+ 8. **Relevance** πŸ”— - Overall relevance of retrieved contexts to the query
44
+
45
+ ## Training Data πŸ“š
46
+
47
+ ### Example Training Instance
48
+ ```json
49
+ {
50
+ "instruction": "Evaluate the agent's response according to the metrics: completeness, clarity, conciseness, precision, recall, mrr, ndcg, relevance",
51
+ "input": {
52
+ "question": "Question about retrieved context",
53
+ "retrieved_contexts": "[Multiple numbered passages with source citations]"
54
+ },
55
+ "output": [
56
+ {
57
+ "name": "completeness",
58
+ "value": 1,
59
+ "comment": "Detailed evaluation comment"
60
+ }
61
+ // ... other metrics
62
+ ]
63
+ }
64
+ ```
65
+
66
+ ## Performance and Limitations ⚑
67
+
68
+ ### Strengths
69
+ - Specialized for RAG evaluation
70
+ - Multi-dimensional assessment capability
71
+ - Detailed explanatory comments for each metric
72
+
73
+ ### Limitations
74
+ - **Context Length**: Performance may vary with very long retrieved contexts
75
+
76
+ ## Ethical Considerations 🀝
77
+
78
+ - The model should be used as a tool to assist human evaluators, not replace human judgment entirely
79
+ - Evaluations should be validated by domain experts for critical applications
80
+
81
+ ## Technical Specifications πŸ”§
82
+
83
+ - **Base Model**: Qwen3-8B
84
+ - **Quantization**: Q8_0
85
+
86
+ ## Usage Example πŸ’»
87
+
88
+ ```python
89
+ from transformers import AutoTokenizer, AutoModelForCausalLM
90
+
91
+ model_name = "mendrika261/rag-evaluator-qwen3-8b"
92
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
93
+ model = AutoModelForCausalLM.from_pretrained(model_name)
94
+
95
+ # Example evaluation prompt
96
+ prompt = """Evaluate the agent's response according to the metrics: completeness, clarity, conciseness, precision, recall, mrr, ndcg, relevance
97
+
98
+ Question: [Your question here]
99
+ Retrieved contexts: [Your retrieved contexts here]"""
100
+
101
+ inputs = tokenizer(prompt, return_tensors="pt")
102
+ outputs = model.generate(**inputs)
103
+ evaluation = tokenizer.decode(outputs[0], skip_special_tokens=True)
104
+ ```
105
+
106
+ ## Citation πŸ“„
107
+
108
+ If you use this model in your research, please cite:
109
+
110
+ ```bibtex
111
+ @misc{mendrika261-rag-evaluator,
112
+ title={RAG Context Evaluator - Qwen3-8B Fine-tuned},
113
+ author={mendrika261},
114
+ year={2025},
115
+ howpublished={\url{https://huggingface.co/mendrika261/rag-evaluation}}
116
+ }
117
+ ```
118
+
119
+ ## Contact πŸ“§
120
+
121
+ For questions or issues regarding this model, please contact the developer through the Hugging Face model repository.
122
+
123
+ ---
124
 
125
  This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
126