KRLabsOrg
/

tinylettuce-ettin-68m-en-bioasq

Token Classification

token classification

hallucination detection

retrieval-augmented generation

Model card Files Files and versions

tinylettuce-ettin-68m-en-bioasq / README.md

adaamko's picture

Update README.md

6092822 verified 5 days ago

|

history blame contribute delete

3.52 kB

	---
	license: mit
	language:
	- en
	base_model:
	- jhu-clsp/ettin-encoder-68m
	pipeline_tag: token-classification
	tags:
	- token classification
	- hallucination detection
	- retrieval-augmented generation
	- transformers
	- ettin
	- lightweight
	datasets:
	- enelpol/rag-mini-bioasq
	library_name: transformers
	---

	# TinyLettuce (Ettin-68M): Efficient Hallucination Detection

	<p align="center">
	<img src="https://github.com/KRLabsOrg/LettuceDetect/blob/dev/assets/tinytinylettuce.png?raw=true" alt="TinyLettuce" width="400"/>
	</p>

	Model Name: tinylettuce-ettin-68m-en-v1

	Organization: KRLabsOrg

	Github: https://github.com/KRLabsOrg/LettuceDetect

	Ettin encoders: https://arxiv.org/pdf/2507.11412

	## Overview

	The 68M Ettin variant provides the highest accuracy among TinyLettuce models while remaining CPU‑friendly. It detects unsupported spans in answers given context and is optimized for low‑cost deployment and fine‑tuning.

	Trained on our synthetic dataset (mixed with RAGTruth), this 68M variant reaches 92.64% F1 on the held‑out synthetic test set, proving the effectiveness of our domain‑specific hallucination data generation pipeline.

	## Model Details

	- Architecture: Ettin encoder (68M) + token‑classification head
	- Task: token classification (0 = supported, 1 = hallucinated)
	- Input: [CLS] context [SEP] question [SEP] answer [SEP], up to 4096 tokens
	- Language: English; License: MIT

	## Training Data

	- Synthetic (train): ~1,500 hallucinated samples (≈3,000 incl. non‑hallucinated) from enelpol/rag-mini-bioasq; intensity 0.3.
	- Synthetic (test): 300 hallucinated samples (≈600 total) held out.

	## Training Procedure

	- Tokenizer: AutoTokenizer; DataCollatorForTokenClassification; label pad −100
	- Max length: 4096; batch size: 8; epochs: 3–6
	- Optimizer: AdamW (lr 1e‑5, weight_decay 0.01)
	- Hardware: Single A100 80GB

	## Results

	Synthetic (domain‑specific):

	\| Model \| Parameters \| Precision (%) \| Recall (%) \| F1 (%) \| Hardware \|
	\|-------\|------------\|---------------\|------------\|--------\|----------\|
	\| TinyLettuce-17M \| 17M \| 84.56 \| 98.21 \| 90.87 \| CPU \|
	\| TinyLettuce-32M \| 32M \| 80.36 \| 99.10 \| 88.76 \| CPU \|
	\| TinyLettuce-68M \| 68M \| 89.54 \| 95.96 \| 92.64 \| CPU \|
	\| GPT-5-mini \| ~200B \| 71.95 \| 100.00 \| 83.69 \| API \|
	\| GPT-OSS-120B \| 120B \| 72.21 \| 98.64 \| 83.38 \| GPU \|
	\| Qwen3-235B \| 235B \| 66.74 \| 99.32 \| 79.84 \| GPU \|

	## Usage

	First install lettucedetect:

	```bash
	pip install lettucedetect
	```

	Then use it:

	```python
	from lettucedetect.models.inference import HallucinationDetector

	detector = HallucinationDetector(
	method="transformer",
	model_path="KRLabsOrg/tinylettuce-ettin-68m-en-v1",
	)

	spans = detector.predict(
	context=[
	"Ibuprofen is an NSAID that reduces inflammation and pain. The typical adult dose is 400-600mg every 6-8 hours, not exceeding 2400mg daily."
	],
	question="What is the maximum daily dose of ibuprofen?",
	answer="The maximum daily dose of ibuprofen for adults is 3200mg.",
	output_format="spans",
	)
	print(spans)
	# Output: [{"start": 51, "end": 57, "text": "3200mg"}]
	```

	## Citing

	If you use the model or the tool, please cite the following paper:

	```bibtex
	@misc{Kovacs:2025,
	title={LettuceDetect: A Hallucination Detection Framework for RAG Applications},
	author={Ádám Kovács and Gábor Recski},
	year={2025},
	eprint={2502.17125},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2502.17125},
	}
	```