arxiv:2507.06223

Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers

Published on Jul 8

· Submitted by

songtingyu on Jul 9

Upvote

Authors:

Tingyu Song ,

Yilun Zhao ,

Abstract

E\textsuperscript{2}R-FLOPs evaluates LLM-based rerankers by measuring relevance and throughput per PetaFLOP, providing a hardware-agnostic metric for efficiency and effectiveness.

AI-generated summary

Large Language Models (LLMs) have recently been applied to reranking tasks in information retrieval, achieving strong performance. However, their high computational demands often hinder practical deployment. Existing studies evaluate the efficiency of LLM-based rerankers using proxy metrics such as latency, the number of forward passes, input tokens, and output tokens. However, these metrics depend on hardware and running-time choices (\eg parallel or not, batch size, etc), and often fail to account for model size, making it difficult to interpret and obscuring the evaluation of the efficiency-effectiveness tradeoff. To address this issue, we propose E2R-FLOPs, for LLM-based rerankers: ranking metrics per PetaFLOP (RPP) for relevance per compute and queries per PetaFLOP (QPP) for hardware-agnostic throughput. Companied with the new metrics, an interpretable FLOPs estimator is built to estimate the FLOPs of an LLM-based reranker even without running any experiments. Based on the proposed metrics, we conduct comprehensive experiments to evaluate a wide range of LLM-based rerankers with different architecture, studying the efficiency-effectiveness trade-off and bringing this issue to the attention of the research community.

View arXiv page View PDF Add to collection

Community

songtingyu

Paper author Paper submitter about 21 hours ago

This paper proposes E²R-FLOPS, a framework for evaluating the efficiency of LLM-based rerankers using hardware-agnostic metrics: ranking metrics per PetaFLOP (RPP) and queries per PetaFLOP (QPP). Unlike existing proxy metrics (e.g., latency or token count), these new metrics account for model size and compute cost. To support this, we propose an interpretable FLOPs estimator built upon E²R-FLOPS, enabling efficiency analysis without running the model. Comprehensive experiments highlight the efficiency-effectiveness trade-off across diverse LLM rerankers, promoting more interpretable and fair comparisons.