metadata

license: mit
language:
  - en
base_model:
  - jhu-clsp/ettin-encoder-68m
pipeline_tag: token-classification
tags:
  - token classification
  - hallucination detection
  - retrieval-augmented generation
  - transformers
  - ettin
  - lightweight
datasets:
  - enelpol/rag-mini-bioasq
library_name: transformers

TinyLettuce (Ettin-68M): Efficient Hallucination Detection

TinyLettuce

Model Name: tinylettuce-ettin-68m-en-v1

Organization: KRLabsOrg

Github: https://github.com/KRLabsOrg/LettuceDetect

Ettin encoders: https://arxiv.org/pdf/2507.11412

Overview

The 68M Ettin variant provides the highest accuracy among TinyLettuce models while remaining CPU‑friendly. It detects unsupported spans in answers given context and is optimized for low‑cost deployment and fine‑tuning.

Trained on our synthetic dataset (mixed with RAGTruth), this 68M variant reaches 92.64% F1 on the held‑out synthetic test set, proving the effectiveness of our domain‑specific hallucination data generation pipeline.

Model Details

Architecture: Ettin encoder (68M) + token‑classification head
Task: token classification (0 = supported, 1 = hallucinated)
Input: [CLS] context [SEP] question [SEP] answer [SEP], up to 4096 tokens
Language: English; License: MIT

Training Data

Synthetic (train): ~1,500 hallucinated samples (≈3,000 incl. non‑hallucinated) from enelpol/rag-mini-bioasq; intensity 0.3.
Synthetic (test): 300 hallucinated samples (≈600 total) held out.

Training Procedure

Tokenizer: AutoTokenizer; DataCollatorForTokenClassification; label pad −100
Max length: 4096; batch size: 8; epochs: 3–6
Optimizer: AdamW (lr 1e‑5, weight_decay 0.01)
Hardware: Single A100 80GB

Results

Synthetic (domain‑specific):

Model	Parameters	Precision (%)	Recall (%)	F1 (%)	Hardware
TinyLettuce-17M	17M	84.56	98.21	90.87	CPU
TinyLettuce-32M	32M	80.36	99.10	88.76	CPU
TinyLettuce-68M	68M	89.54	95.96	92.64	CPU
GPT-5-mini	~200B	71.95	100.00	83.69	API
GPT-OSS-120B	120B	72.21	98.64	83.38	GPU
Qwen3-235B	235B	66.74	99.32	79.84	GPU

Usage

First install lettucedetect:

pip install lettucedetect

Then use it:

from lettucedetect.models.inference import HallucinationDetector

detector = HallucinationDetector(
    method="transformer",
    model_path="KRLabsOrg/tinylettuce-ettin-68m-en-v1",
)

spans = detector.predict(
    context=[
        "Ibuprofen is an NSAID that reduces inflammation and pain. The typical adult dose is 400-600mg every 6-8 hours, not exceeding 2400mg daily."
    ],
    question="What is the maximum daily dose of ibuprofen?",
    answer="The maximum daily dose of ibuprofen for adults is 3200mg.",
    output_format="spans",
)
print(spans)
# Output: [{"start": 51, "end": 57, "text": "3200mg"}]

Citing

If you use the model or the tool, please cite the following paper:

@misc{Kovacs:2025,
      title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, 
      author={Ádám Kovács and Gábor Recski},
      year={2025},
      eprint={2502.17125},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.17125}, 
}