TinyLettuce (Ettin-32M): Efficient Hallucination Detection

TinyLettuce

Model Name: tinylettuce-ettin-32m-en-v1

Organization: KRLabsOrg

Github: https://github.com/KRLabsOrg/LettuceDetect

Ettin encoders: https://arxiv.org/pdf/2507.11412

Overview

TinyLettuce is a token‑classification model that flags unsupported spans in answers given context. The 32M Ettin variant balances accuracy and CPU‑side efficiency; it’s designed for low‑cost domain fine‑tuning on synthetic data.

Trained on our synthetic dataset (mixed with RAGTruth), this 32M variant achieves 88.76% F1 on the held‑out synthetic test set (beating large-scale LLM judges like GPT-OSS-120b), proving the effectiveness of our domain‑specific hallucination data generation pipeline.

Model Details

  • Architecture: Ettin encoder (32M) + token‑classification head
  • Task: token classification (0 = supported, 1 = hallucinated)
  • Input: [CLS] context [SEP] question [SEP] answer [SEP], up to 4096 tokens
  • Language: English; License: MIT

Training Data

  • Synthetic (train): ~1,500 hallucinated samples (≈3,000 with non‑hallucinated) from enelpol/rag-mini-bioasq; intensity 0.3.
  • Synthetic (test): 300 hallucinated samples (≈600 total) held out.

Training Procedure

  • Tokenizer: AutoTokenizer; DataCollatorForTokenClassification; label pad −100
  • Max length: 4096; batch size: 8; epochs: 3
  • Optimizer: AdamW (lr 1e‑5, weight_decay 0.01)
  • Hardware: Single A100 80GB

Results

Synthetic (domain‑specific):

Model Parameters Precision (%) Recall (%) F1 (%) Hardware
TinyLettuce-17M 17M 84.56 98.21 90.87 CPU
TinyLettuce-32M 32M 80.36 99.10 88.76 CPU
TinyLettuce-68M 68M 89.54 95.96 92.64 CPU
GPT-5-mini ~200B 71.95 100.00 83.69 API/GPU
GPT-OSS-120B 120B 72.21 98.64 83.38 GPU
Qwen3-235B 235B 66.74 99.32 79.84 GPU

Usage

First install lettucedetect:

pip install lettucedetect

Then use it:

from lettucedetect.models.inference import HallucinationDetector

detector = HallucinationDetector(
    method="transformer",
    model_path="KRLabsOrg/tinylettuce-ettin-32m-en-v1",
)

spans = detector.predict(
    context=[
        "Ibuprofen is an NSAID that reduces inflammation and pain. The typical adult dose is 400-600mg every 6-8 hours, not exceeding 2400mg daily."
    ],
    question="What is the maximum daily dose of ibuprofen?",
    answer="The maximum daily dose of ibuprofen for adults is 3200mg.",
    output_format="spans",
)
print(spans)
# Output: [{"start": 51, "end": 57, "text": "3200mg"}]

Citing

If you use the model or the tool, please cite the following paper:

@misc{Kovacs:2025,
      title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, 
      author={Ádám Kovács and Gábor Recski},
      year={2025},
      eprint={2502.17125},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.17125}, 
}
Downloads last month
20
Safetensors
Model size
32M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KRLabsOrg/tinylettuce-ettin-32m-en-bioasq

Finetuned
(7)
this model

Dataset used to train KRLabsOrg/tinylettuce-ettin-32m-en-bioasq

Collection including KRLabsOrg/tinylettuce-ettin-32m-en-bioasq