Papers
arxiv:2507.23539

Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions

Published on Jul 31
Authors:
,
,

Abstract

An algorithm is developed for efficient computation of matrix-vector products for Gaussian kernel matrices, enabling fast attention mechanisms in large language models.

AI-generated summary

Motivated by the problem of fast processing of attention matrices, we study fast algorithms for computing matrix-vector products for asymmetric Gaussian Kernel matrices Kin R^{ntimes n}. K's columns are indexed by a set of n keys k_1,k_2ldots, k_nin R^d, rows by a set of n queries q_1,q_2,ldots,q_nin R^d , and its i,j entry is K_{ij} = e^{-|q_i-k_j|_2^2/2sigma^2} for some bandwidth parameter sigma>0. Given a vector xin R^n and error parameter epsilon>0, our task is to output a yin R^n such that |Kx-y|_2leq epsilon |x|_2 in time subquadratic in n and linear in d. Our algorithms rely on the following modelling assumption about the matrices K: the sum of the entries of K scales linearly in n, as opposed to worst case quadratic growth. We validate this assumption experimentally, for Gaussian kernel matrices encountered in various settings such as fast attention computation in LLMs. We obtain the first subquadratic-time algorithm that works under this assumption, for unrestricted vectors.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.23539 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.23539 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.23539 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.