|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- jhu-clsp/ettin-encoder-17m |
|
pipeline_tag: token-classification |
|
tags: |
|
- token classification |
|
- hallucination detection |
|
- retrieval-augmented generation |
|
- transformers |
|
- ettin |
|
- lightweight |
|
datasets: |
|
- KRLabsOrg/rag-bioasq-lettucedetect |
|
library_name: transformers |
|
--- |
|
|
|
# TinyLettuce (Ettin-17M): Efficient Hallucination Detection |
|
|
|
<p align="center"> |
|
<img src="https://github.com/KRLabsOrg/LettuceDetect/blob/dev/assets/tinytinylettuce.png?raw=true" alt="TinyLettuce" width="400"/> |
|
</p> |
|
|
|
**Model Name:** tinylettuce-ettin-17m-en-bioasq |
|
|
|
**Organization:** KRLabsOrg |
|
|
|
**Github:** https://github.com/KRLabsOrg/LettuceDetect |
|
|
|
**Ettin encoders:** https://arxiv.org/pdf/2507.11412 |
|
|
|
## Overview |
|
|
|
TinyLettuce is a lightweight token‑classification model that flags unsupported spans in answers given context (span aggregation performed downstream). Built on the 17M Ettin encoder, it targets real‑time CPU inference and low‑cost domain fine‑tuning with synthetic data. |
|
|
|
Trained on our synthetic dataset, this 17M variant achieves 90.87% F1 on the held‑out synthetic test set (beating large LLM judges, like GPT-OSS-120), proving the effectiveness of our domain‑specific hallucination data generation pipeline. |
|
|
|
## Model Details |
|
|
|
- Architecture: Ettin encoder (17M) + token‑classification head |
|
- Task: token classification (0 = supported, 1 = hallucinated) |
|
- Input format: [CLS] context [SEP] question [SEP] answer [SEP], up to 4096 tokens |
|
- Language: English; License: MIT |
|
|
|
## Training Data |
|
|
|
- Synthetic generation (train): ~1,500 hallucinated samples (≈3,000 total with non‑hallucinated) from enelpol/rag-mini-bioasq; intensity 0.3. |
|
- Synthetic generation (test): 300 hallucinated samples (≈600 total) held out. |
|
|
|
## Training Procedure |
|
|
|
- Tokenizer: AutoTokenizer; DataCollatorForTokenClassification; label pad −100 |
|
- Max length: 4096 tokens; batch size: 8; epochs: 3 |
|
- Optimizer: AdamW (lr 1e‑5, weight_decay 0.01) |
|
- Hardware: Single A100 80GB; CPU inference targeted |
|
|
|
## Results |
|
|
|
Synthetic (domain‑specific): |
|
|
|
| Model | Parameters | Precision (%) | Recall (%) | F1 (%) | Hardware | |
|
|-------|------------|---------------|------------|--------|----------| |
|
| **TinyLettuce-17M** | 17M | 84.56 | 98.21 | 90.87 | CPU | |
|
| TinyLettuce-32M | 32M | 80.36 | 99.10 | 88.76 | CPU | |
|
| TinyLettuce-68M | 68M | 89.54 | 95.96 | 92.64 | CPU | |
|
| GPT-5-mini | ~200B | 71.95 | 100.00 | 83.69 | API/GPU | |
|
| GPT-OSS-120B | 120B | 72.21 | 98.64 | 83.38 | GPU | |
|
| Qwen3-235B | 235B | 66.74 | 99.32 | 79.84 | GPU | |
|
|
|
Notes: “Synthetic” metrics reflect generated data; absolute scores depend on post‑processing thresholds and domain. |
|
|
|
## Usage |
|
|
|
First install lettucedetect: |
|
|
|
```bash |
|
pip install lettucedetect |
|
``` |
|
|
|
Then use it: |
|
|
|
```python |
|
from lettucedetect.models.inference import HallucinationDetector |
|
|
|
detector = HallucinationDetector( |
|
method="transformer", |
|
model_path="KRLabsOrg/tinylettuce-ettin-17m-en-v1", |
|
) |
|
|
|
spans = detector.predict( |
|
context=[ |
|
"Ibuprofen is an NSAID that reduces inflammation and pain. The typical adult dose is 400-600mg every 6-8 hours, not exceeding 2400mg daily." |
|
], |
|
question="What is the maximum daily dose of ibuprofen?", |
|
answer="The maximum daily dose of ibuprofen for adults is 3200mg.", |
|
output_format="spans", |
|
) |
|
print(spans) |
|
# Output: [{"start": 51, "end": 57, "text": "3200mg"}] |
|
``` |
|
|
|
## Citing |
|
|
|
If you use the model or the tool, please cite the following paper: |
|
|
|
```bibtex |
|
@misc{Kovacs:2025, |
|
title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, |
|
author={Ádám Kovács and Gábor Recski}, |
|
year={2025}, |
|
eprint={2502.17125}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2502.17125}, |
|
} |
|
``` |