Do not compute: Fast approach for vector search｜Exhibition Program｜NTT Communication Science Laboratories OPEN HOUSE 2026

Exhibition Program

Machine Learning Science

04	Do not compute: Fast approach for vector search Accelerating ScaNN via pruning-based vector quantization

Abstract

Quantization, which replaces vectors with codewords, is widely used to enable fast, accurate inner-product-based similarity search over large-scale data. ScaNN is a popular approach for quantization. ScaNN computes the quantization error between each vector and all possible codewords to select the codeword with the smallest error, achieving high approximation accuracy. However, since ScaNN requires error computation with all codewords, it incurs a high computational cost, making quantization extremely slow for large-scale datasets. The proposed approach uses upper bounds on quantization errors and efficiently evaluates them to prune codeword candidates. This significantly reduces the number of error computations required. As a result, the proposed approach can substantially accelerate vector quantization while preserving ScaNN's search accuracy. Consequently, it facilitates practical large-scale data processing in applications such as image retrieval and natural language processing.

Do not compute: Fast approach for vector search

References

[1] Y. Fujiwara, Á. López, Y. Ida, A. Kumagai, M. Nakano, M. Nakatsuji, A. Kimura, “Fast Vector Quantization Algorithm for ScaNN”, in Proc KDD, 2026.

Poster

Please click the icon to open the full-size PDF file.

Contact

Yasuhiro Fujiwara, Recognition Research Group, Media Information Laboratory

Click here for other research exhibits

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22

Do not compute: Fast approach for vector search

Accelerating ScaNN via pruning-based vector quantization

Contact

Download