|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- jhu-clsp/ettin-encoder-68m |
|
pipeline_tag: token-classification |
|
tags: |
|
- token classification |
|
- hallucination detection |
|
- retrieval-augmented generation |
|
- transformers |
|
- ettin |
|
- lightweight |
|
datasets: |
|
- enelpol/rag-mini-bioasq |
|
library_name: transformers |
|
--- |
|
|
|
# TinyLettuce (Ettin-68M): Efficient Hallucination Detection |
|
|
|
<p align="center"> |
|
<img src="https://github.com/KRLabsOrg/LettuceDetect/blob/dev/assets/tinytinylettuce.png?raw=true" alt="TinyLettuce" width="400"/> |
|
</p> |
|
|
|
**Model Name:** tinylettuce-ettin-68m-en-v1 |
|
|
|
**Organization:** KRLabsOrg |
|
|
|
**Github:** https://github.com/KRLabsOrg/LettuceDetect |
|
|
|
**Ettin encoders:** https://arxiv.org/pdf/2507.11412 |
|
|
|
## Overview |
|
|
|
The 68M Ettin variant provides the highest accuracy among TinyLettuce models while remaining CPU‑friendly. It detects unsupported spans in answers given context and is optimized for low‑cost deployment and fine‑tuning. |
|
|
|
Trained on our synthetic dataset (mixed with RAGTruth), this 68M variant reaches 92.64% F1 on the held‑out synthetic test set, proving the effectiveness of our domain‑specific hallucination data generation pipeline. |
|
|
|
## Model Details |
|
|
|
- Architecture: Ettin encoder (68M) + token‑classification head |
|
- Task: token classification (0 = supported, 1 = hallucinated) |
|
- Input: [CLS] context [SEP] question [SEP] answer [SEP], up to 4096 tokens |
|
- Language: English; License: MIT |
|
|
|
## Training Data |
|
|
|
- Synthetic (train): ~1,500 hallucinated samples (≈3,000 incl. non‑hallucinated) from enelpol/rag-mini-bioasq; intensity 0.3. |
|
- Synthetic (test): 300 hallucinated samples (≈600 total) held out. |
|
|
|
## Training Procedure |
|
|
|
- Tokenizer: AutoTokenizer; DataCollatorForTokenClassification; label pad −100 |
|
- Max length: 4096; batch size: 8; epochs: 3–6 |
|
- Optimizer: AdamW (lr 1e‑5, weight_decay 0.01) |
|
- Hardware: Single A100 80GB |
|
|
|
## Results |
|
|
|
Synthetic (domain‑specific): |
|
|
|
| Model | Parameters | Precision (%) | Recall (%) | F1 (%) | Hardware | |
|
|-------|------------|---------------|------------|--------|----------| |
|
| TinyLettuce-17M | 17M | 84.56 | 98.21 | 90.87 | CPU | |
|
| TinyLettuce-32M | 32M | 80.36 | 99.10 | 88.76 | CPU | |
|
| **TinyLettuce-68M** | 68M | 89.54 | 95.96 | **92.64** | CPU | |
|
| GPT-5-mini | ~200B | 71.95 | 100.00 | 83.69 | API | |
|
| GPT-OSS-120B | 120B | 72.21 | 98.64 | 83.38 | GPU | |
|
| Qwen3-235B | 235B | 66.74 | 99.32 | 79.84 | GPU | |
|
|
|
## Usage |
|
|
|
First install lettucedetect: |
|
|
|
```bash |
|
pip install lettucedetect |
|
``` |
|
|
|
Then use it: |
|
|
|
```python |
|
from lettucedetect.models.inference import HallucinationDetector |
|
|
|
detector = HallucinationDetector( |
|
method="transformer", |
|
model_path="KRLabsOrg/tinylettuce-ettin-68m-en-v1", |
|
) |
|
|
|
spans = detector.predict( |
|
context=[ |
|
"Ibuprofen is an NSAID that reduces inflammation and pain. The typical adult dose is 400-600mg every 6-8 hours, not exceeding 2400mg daily." |
|
], |
|
question="What is the maximum daily dose of ibuprofen?", |
|
answer="The maximum daily dose of ibuprofen for adults is 3200mg.", |
|
output_format="spans", |
|
) |
|
print(spans) |
|
# Output: [{"start": 51, "end": 57, "text": "3200mg"}] |
|
``` |
|
|
|
## Citing |
|
|
|
If you use the model or the tool, please cite the following paper: |
|
|
|
```bibtex |
|
@misc{Kovacs:2025, |
|
title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, |
|
author={Ádám Kovács and Gábor Recski}, |
|
year={2025}, |
|
eprint={2502.17125}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2502.17125}, |
|
} |
|
``` |