Papers
arxiv:2510.12117

Locket: Robust Feature-Locking Technique for Language Models

Published on Oct 14
ยท Submitted by Lipeng (Tony) He on Oct 15
Authors:
,

Abstract

Chatbot providers (e.g., OpenAI) rely on tiered subscription schemes to generate revenue, offering basic models for free users, and advanced models for paying subscribers. However, a finer-grained pay-to-unlock scheme for premium features (e.g., math, coding) is thought to be more economically viable for the providers. Such a scheme requires a feature-locking technique (FLoTE) which is (i) effective in refusing locked features, (ii) utility-preserving for unlocked features, (iii) robust against evasion or unauthorized credential sharing, and (iv) scalable to multiple features and users. However, existing FLoTEs (e.g., password-locked models) are not robust or scalable. We present Locket, the first robust and scalable FLoTE to enable pay-to-unlock schemes. Locket uses a novel merging approach to attach adapters to an LLM for refusing unauthorized features. Our comprehensive evaluation shows that Locket is effective (100% refusal on locked features), utility-preserving (leq 7% utility degradation in unlocked features), robust (leq 5% attack success rate), and scales to multiple features and clients.

Community

Paper author Paper submitter

The widespread adoption of #LLM chatbot services has created a large and diverse user base, driving up computing and operational costs. Providers rely on tiered subscription plans to generate revenue ๐Ÿ’ฐ, offering black-box access to basic models for free users and advanced models to paying subscribers.

However, this all-or-nothing approach is unprofitable and inflexible for the users:

A pay-to-unlock scheme ๐Ÿ” for premium features (e.g., math, coding) and specific model capabilities (e.g. medical diagnosis, age-gating) offers a more sustainable alternative. In this work, we present a feature-locking technique (FLoTE) that is:

  • Effective in refusing locked features,
  • Utility-preserving for unlocked features,
  • Robust against evasion or unauthorized credential sharing, and
  • Scalable to multiple features and clients.

This work represents an initial step towards more fine-grained control of generative model behaviour, potentially enabling many future applications.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.12117 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.12117 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.12117 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.