TinyLettuce (Ettin-32M): Efficient Hallucination Detection

TinyLettuce

Model Name: tinylettuce-ettin-32m-en-v1

Organization: KRLabsOrg

Github: https://github.com/KRLabsOrg/LettuceDetect

Ettin encoders: https://arxiv.org/pdf/2507.11412

Overview

TinyLettuce is a token‑classification model that flags unsupported spans in answers given context. The 32M Ettin variant balances accuracy and CPU‑side efficiency; it’s designed for low‑cost domain fine‑tuning on synthetic data.

Trained on our synthetic dataset (mixed with RAGTruth), this 32M variant achieves 88.76% F1 on the held‑out synthetic test set (beating large-scale LLM judges like GPT-OSS-120b), proving the effectiveness of our domain‑specific hallucination data generation pipeline.

Model Details

Architecture: Ettin encoder (32M) + token‑classification head
Task: token classification (0 = supported, 1 = hallucinated)
Input: [CLS] context [SEP] question [SEP] answer [SEP], up to 4096 tokens
Language: English; License: MIT

Training Data

Synthetic (train): ~1,500 hallucinated samples (≈3,000 with non‑hallucinated) from enelpol/rag-mini-bioasq; intensity 0.3.
Synthetic (test): 300 hallucinated samples (≈600 total) held out.

Training Procedure

Tokenizer: AutoTokenizer; DataCollatorForTokenClassification; label pad −100
Max length: 4096; batch size: 8; epochs: 3
Optimizer: AdamW (lr 1e‑5, weight_decay 0.01)
Hardware: Single A100 80GB

Results

Synthetic (domain‑specific):

Model	Parameters	Precision (%)	Recall (%)	F1 (%)	Hardware
TinyLettuce-17M	17M	84.56	98.21	90.87	CPU
TinyLettuce-32M	32M	80.36	99.10	88.76	CPU
TinyLettuce-68M	68M	89.54	95.96	92.64	CPU
GPT-5-mini	~200B	71.95	100.00	83.69	API/GPU
GPT-OSS-120B	120B	72.21	98.64	83.38	GPU
Qwen3-235B	235B	66.74	99.32	79.84	GPU

Usage

First install lettucedetect:

pip install lettucedetect

Then use it:

from lettucedetect.models.inference import HallucinationDetector

detector = HallucinationDetector(
    method="transformer",
    model_path="KRLabsOrg/tinylettuce-ettin-32m-en-v1",
)

spans = detector.predict(
    context=[
        "Ibuprofen is an NSAID that reduces inflammation and pain. The typical adult dose is 400-600mg every 6-8 hours, not exceeding 2400mg daily."
    ],
    question="What is the maximum daily dose of ibuprofen?",
    answer="The maximum daily dose of ibuprofen for adults is 3200mg.",
    output_format="spans",
)
print(spans)
# Output: [{"start": 51, "end": 57, "text": "3200mg"}]

Citing

If you use the model or the tool, please cite the following paper:

@misc{Kovacs:2025,
      title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, 
      author={Ádám Kovács and Gábor Recski},
      year={2025},
      eprint={2502.17125},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.17125}, 
}

KRLabsOrg
/

tinylettuce-ettin-32m-en-bioasq