HOME / Exhibition Program / Do not compute: Fast approach for vector search
Exhibition Program
Machine Learning Science
04

Do not compute: Fast approach for vector search

Accelerating ScaNN via pruning-based vector quantization

Do not compute: Fast approach for vector search
Abstract

Quantization, which replaces vectors with codewords, is widely used to enable fast, accurate inner-product-based similarity search over large-scale data. ScaNN is a popular approach for quantization. ScaNN computes the quantization error between each vector and all possible codewords to select the codeword with the smallest error, achieving high approximation accuracy. However, since ScaNN requires error computation with all codewords, it incurs a high computational cost, making quantization extremely slow for large-scale datasets. The proposed approach uses upper bounds on quantization errors and efficiently evaluates them to prune codeword candidates. This significantly reduces the number of error computations required. As a result, the proposed approach can substantially accelerate vector quantization while preserving ScaNN's search accuracy. Consequently, it facilitates practical large-scale data processing in applications such as image retrieval and natural language processing.

Do not compute: Fast approach for vector search
References

[1] Y. Fujiwara, Á. López, Y. Ida, A. Kumagai, M. Nakano, M. Nakatsuji, A. Kimura, “Fast Vector Quantization Algorithm for ScaNN”, in Proc KDD, 2026.

Poster
Contact

Yasuhiro Fujiwara, Recognition Research Group, Media Information Laboratory

Click here for other research exhibits