EviOmni-nq_train-1.5B

Introduction

EviOmni is a rational evidence extraction model. Compared to vanilla evidence extraction models, EviOmni demonstrates the superiority in terms of performance, generalization, efficiency, and robustness.

Requirements

The code of EviOmni has been in the latest Huggingface transformers and we advise you to use the latest version of transformers.

With transformers<4.37.0, you will encounter the following error:

KeyError: 'qwen2'

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import StoppingCriteria, StoppingCriteriaList
import re

class MultiTokenStoppingCriteria(StoppingCriteria):
    def __init__(self, stop_ids, device):
        self.stop_ids = stop_ids
        self.stop_len = len(stop_ids)

    def __call__(self, input_ids, scores, **kwargs):
        if len(input_ids[0]) >= self.stop_len:
            last_tokens = input_ids[0][-self.stop_len:].tolist()
            return last_tokens == self.stop_ids
        return False

model_name = "HIT-TMG/EviOmni-nq_train-1.5B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = open("eviomni_prompt", "r").read()
question = "..."
passages = "..."
instruction = prompt.format(question=question, passages=passages)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": instruction}
]

stop_token = "</extract>\n\n"
stop_ids = tokenizer.encode(stop_token, add_special_tokens=False)

stopping_criteria = StoppingCriteriaList([
    MultiTokenStoppingCriteria(stop_ids, model.device)
])

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,
    stopping_criteria=stopping_criteria
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
match = re.search(r"<extract>(.*?)</extract>", response, re.DOTALL)
evidence = match.group(1).strip()

Performance

Main results.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{EviOmni,
      title={Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation}, 
      author={Xinping Zhao and Shouzheng Huang and Yan Zhong and Xinshuo Hu and Meishan Zhang and Baotian Hu and Min Zhang},
      year={2025},
      eprint={2507.15586},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.15586}, 
}