---
license: mit
language:
- en
base_model:
- jhu-clsp/ettin-encoder-32m
pipeline_tag: token-classification
tags:
- token classification
- hallucination detection
- retrieval-augmented generation
- transformers
- ettin
- lightweight
datasets:
- enelpol/rag-mini-bioasq
library_name: transformers
---

# TinyLettuce (Ettin-32M): Efficient Hallucination Detection

<p align="center">
  <img src="https://github.com/KRLabsOrg/LettuceDetect/blob/dev/assets/tinytinylettuce.png?raw=true" alt="TinyLettuce" width="400"/>
</p>

**Model Name:** tinylettuce-ettin-32m-en-v1

**Organization:** KRLabsOrg  

**Github:** https://github.com/KRLabsOrg/LettuceDetect

**Ettin encoders:** https://arxiv.org/pdf/2507.11412

## Overview

TinyLettuce is a token‑classification model that flags unsupported spans in answers given context. The 32M Ettin variant balances accuracy and CPU‑side efficiency; it’s designed for low‑cost domain fine‑tuning on synthetic data.

Trained on our synthetic dataset (mixed with RAGTruth), this 32M variant achieves 88.76% F1 on the held‑out synthetic test set (beating large-scale LLM judges like GPT-OSS-120b), proving the effectiveness of our domain‑specific hallucination data generation pipeline.

## Model Details

- Architecture: Ettin encoder (32M) + token‑classification head
- Task: token classification (0 = supported, 1 = hallucinated)
- Input: [CLS] context [SEP] question [SEP] answer [SEP], up to 4096 tokens
- Language: English; License: MIT

## Training Data

- Synthetic (train): ~1,500 hallucinated samples (≈3,000 with non‑hallucinated) from enelpol/rag-mini-bioasq; intensity 0.3.
- Synthetic (test): 300 hallucinated samples (≈600 total) held out.

## Training Procedure

- Tokenizer: AutoTokenizer; DataCollatorForTokenClassification; label pad −100
- Max length: 4096; batch size: 8; epochs: 3
- Optimizer: AdamW (lr 1e‑5, weight_decay 0.01)
- Hardware: Single A100 80GB

## Results

Synthetic (domain‑specific):

| Model | Parameters | Precision (%) | Recall (%) | F1 (%) | Hardware |
|-------|------------|---------------|------------|--------|----------|
| TinyLettuce-17M | 17M | 84.56 | 98.21 | 90.87 | CPU |
| **TinyLettuce-32M** | 32M | 80.36 | 99.10 | 88.76 | CPU |
| TinyLettuce-68M | 68M | 89.54 | 95.96 | 92.64 | CPU |
| GPT-5-mini | ~200B | 71.95 | 100.00 | 83.69 | API/GPU |
| GPT-OSS-120B | 120B | 72.21 | 98.64 | 83.38 | GPU |
| Qwen3-235B | 235B | 66.74 | 99.32 | 79.84 | GPU |


## Usage

First install lettucedetect:

```bash
pip install lettucedetect
```

Then use it:

```python
from lettucedetect.models.inference import HallucinationDetector

detector = HallucinationDetector(
    method="transformer",
    model_path="KRLabsOrg/tinylettuce-ettin-32m-en-v1",
)

spans = detector.predict(
    context=[
        "Ibuprofen is an NSAID that reduces inflammation and pain. The typical adult dose is 400-600mg every 6-8 hours, not exceeding 2400mg daily."
    ],
    question="What is the maximum daily dose of ibuprofen?",
    answer="The maximum daily dose of ibuprofen for adults is 3200mg.",
    output_format="spans",
)
print(spans)
# Output: [{"start": 51, "end": 57, "text": "3200mg"}]
```

## Citing

If you use the model or the tool, please cite the following paper:

```bibtex
@misc{Kovacs:2025,
      title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, 
      author={Ádám Kovács and Gábor Recski},
      year={2025},
      eprint={2502.17125},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.17125}, 
}
```