|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- jhu-clsp/ettin-encoder-68m |
|
pipeline_tag: token-classification |
|
tags: |
|
- token classification |
|
- hallucination detection |
|
- retrieval-augmented generation |
|
- transformers |
|
- ettin |
|
- lightweight |
|
datasets: |
|
- ragtruth |
|
- KRLabsOrg/rag-bioasq-lettucedetect |
|
library_name: transformers |
|
--- |
|
|
|
# TinyLettuce (Ettin-68M): Efficient Hallucination Detection |
|
|
|
<p align="center"> |
|
<img src="https://github.com/KRLabsOrg/LettuceDetect/blob/dev/assets/tinytinylettuce.png?raw=true" alt="TinyLettuce" width="400"/> |
|
</p> |
|
|
|
**Model Name:** tinylettuce-ettin-68m-en |
|
|
|
**Organization:** KRLabsOrg |
|
|
|
**Github:** https://github.com/KRLabsOrg/LettuceDetect |
|
|
|
**Ettin encoders:** https://arxiv.org/pdf/2507.11412 |
|
|
|
## Overview |
|
|
|
TinyLettuce is a lightweight token‑classification model that flags unsupported spans in answers given context (span aggregation performed downstream). Built on the 68M Ettin encoder, it targets real‑time CPU inference and low‑cost domain fine‑tuning. |
|
This variant is trained only on our synthetic data and RAGTruth dataset for hallucination detection, using the 68M Ettin encoder and a token‑classification head. Highest accuracy among TinyLettuce sizes, works great given it's size (74.97% vs 76.07 LettuceDetect-ModernBERT-base); optimized for efficient CPU inference. |
|
|
|
## Model Details |
|
|
|
- Architecture: Ettin encoder (68M) + token‑classification head |
|
- Task: token classification (0 = supported, 1 = hallucinated) |
|
- Input: [CLS] context [SEP] question [SEP] answer [SEP], up to 4096 tokens |
|
- Language: English; License: MIT |
|
|
|
## Training Data |
|
|
|
- RAGTruth (English), span‑level labels; no synthetic data mixed |
|
|
|
## Training Procedure |
|
|
|
- Tokenizer: AutoTokenizer; DataCollatorForTokenClassification; label pad −100 |
|
- Max length: 4096; batch size: 16; epochs: 5 |
|
- Optimizer: AdamW (lr 1e‑5, weight_decay 0.01) |
|
- Hardware: Single A100 80GB |
|
|
|
## Results (RAGTruth) |
|
|
|
This model is designed primarily for fine-tuning on smaller, domain-specific samples, rather than for general use. |
|
|
|
Performs well on the RAGTruth benchmark, coming close to our LettuceDetect-base (150m ModernBERT) model. |
|
|
|
| Model | Parameters | F1 (%) | |
|
|-------|------------|--------| |
|
| **TinyLettuce-68M** | 68M | **74.97** | |
|
| LettuceDetect-base (ModernBERT) | 150M | 76.07 | |
|
| LettuceDetect-large (ModernBERT) | 395M | 79.22 | |
|
| Llama-2-13B (RAGTruth FT) | 13B | 78.70 | |
|
|
|
## Usage |
|
|
|
First install lettucedetect: |
|
|
|
```bash |
|
pip install lettucedetect |
|
``` |
|
|
|
Then use it: |
|
|
|
```python |
|
from lettucedetect.models.inference import HallucinationDetector |
|
|
|
detector = HallucinationDetector( |
|
method="transformer", |
|
model_path="KRLabsOrg/tinylettuce-ettin-68m-en", |
|
) |
|
|
|
spans = detector.predict( |
|
context=[ |
|
"Ibuprofen is an NSAID that reduces inflammation and pain. The typical adult dose is 400-600mg every 6-8 hours, not exceeding 2400mg daily." |
|
], |
|
question="What is the maximum daily dose of ibuprofen?", |
|
answer="The maximum daily dose of ibuprofen for adults is 3200mg.", |
|
output_format="spans", |
|
) |
|
print(spans) |
|
# Output: [{"start": 51, "end": 57, "text": "3200mg"}] |
|
``` |
|
|
|
## Citing |
|
|
|
If you use the model or the tool, please cite the following paper: |
|
|
|
```bibtex |
|
@misc{Kovacs:2025, |
|
title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, |
|
author={Ádám Kovács and Gábor Recski}, |
|
year={2025}, |
|
eprint={2502.17125}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2502.17125}, |
|
} |
|
``` |