Locket: Robust Feature-Locking Technique for Language Models
Abstract
Chatbot providers (e.g., OpenAI) rely on tiered subscription schemes to generate revenue, offering basic models for free users, and advanced models for paying subscribers. However, a finer-grained pay-to-unlock scheme for premium features (e.g., math, coding) is thought to be more economically viable for the providers. Such a scheme requires a feature-locking technique (FLoTE) which is (i) effective in refusing locked features, (ii) utility-preserving for unlocked features, (iii) robust against evasion or unauthorized credential sharing, and (iv) scalable to multiple features and users. However, existing FLoTEs (e.g., password-locked models) are not robust or scalable. We present Locket, the first robust and scalable FLoTE to enable pay-to-unlock schemes. Locket uses a novel merging approach to attach adapters to an LLM for refusing unauthorized features. Our comprehensive evaluation shows that Locket is effective (100% refusal on locked features), utility-preserving (leq 7% utility degradation in unlocked features), robust (leq 5% attack success rate), and scales to multiple features and clients.
Community
The widespread adoption of #LLM chatbot services has created a large and diverse user base, driving up computing and operational costs. Providers rely on tiered subscription plans to generate revenue ๐ฐ, offering black-box access to basic models for free users and advanced models to paying subscribers.
However, this all-or-nothing approach is unprofitable and inflexible for the users:
A pay-to-unlock scheme ๐ for premium features (e.g., math, coding) and specific model capabilities (e.g. medical diagnosis, age-gating) offers a more sustainable alternative. In this work, we present a feature-locking technique (FLoTE) that is:
- Effective in refusing locked features,
- Utility-preserving for unlocked features,
- Robust against evasion or unauthorized credential sharing, and
- Scalable to multiple features and clients.
This work represents an initial step towards more fine-grained control of generative model behaviour, potentially enabling many future applications.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Fine-Tuning Jailbreaks under Highly Constrained Black-Box Settings: A Three-Pronged Approach (2025)
- Unlocking the Effectiveness of LoRA-FP for Seamless Transfer Implantation of Fingerprints in Downstream Models (2025)
- Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing (2025)
- CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor (2025)
- CBP-Tuning: Efficient Local Customization for Black-box Large Language Models (2025)
- EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint (2025)
- ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper