| 04 |
Do not compute: Fast approach for vector searchAccelerating ScaNN via pruning-based vector quantization
|
|---|
Quantization, which replaces vectors with codewords, is widely used to enable fast, accurate inner-product-based similarity search over large-scale data. ScaNN is a popular approach for quantization. ScaNN computes the quantization error between each vector and all possible codewords to select the codeword with the smallest error, achieving high approximation accuracy. However, since ScaNN requires error computation with all codewords, it incurs a high computational cost, making quantization extremely slow for large-scale datasets. The proposed approach uses upper bounds on quantization errors and efficiently evaluates them to prune codeword candidates. This significantly reduces the number of error computations required. As a result, the proposed approach can substantially accelerate vector quantization while preserving ScaNN's search accuracy. Consequently, it facilitates practical large-scale data processing in applications such as image retrieval and natural language processing.
[1] Y. Fujiwara, Á. López, Y. Ida, A. Kumagai, M. Nakano, M. Nakatsuji, A. Kimura, “Fast Vector Quantization Algorithm for ScaNN”, in Proc KDD, 2026.
Yasuhiro Fujiwara, Recognition Research Group, Media Information Laboratory