Can you trust these search results?｜Exhibition Program｜NTT Communication Science Laboratories OPEN HOUSE 2026

Exhibition Program

Communication and Mathematical Science

14	Can you trust these search results? Hub text identification for cross-modal embeddings

Abstract

Cross-modal embeddings, which map images and texts into a shared space, enable cross-modal retrieval. However, “hub” embeddings that exhibit spuriously high similarity to many queries regardless of true relevance are widely observed and degrade retrieval reliability. To analyze the nature of hubs, we propose a method for identifying “hub texts,” which show unreasonably high similarity to many unrelated images. We demonstrate that these texts significantly degrade retrieval performance in practice. This identification is essential for understanding the behavior of hub texts and is a key step toward mitigating their impact. Despite recent advances in AI, including embedding models, reliability remains an open challenge. In particular, the conditions under which models exhibit unexpected behavior are not yet well understood. Our findings contribute to a deeper understanding of model behavior and reliability in modern AI systems.

References

[1] H. Deguchi, K. Chousa, Y. Sakai, “One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness,” in Proc. The 64th Annual Meeting of the Association for Computational Linguistics (ACL2026), 2026. (to appear)

[2] H. Deguchi, K. Chousa, Y. Sakai, “Hacking Neural Evaluation Metrics with Single Hub Text,” in Proc. The 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL2026), pp. 198-206, 2026.

Poster

Please click the icon to open the full-size PDF file.

Contact

Hiroyuki Deguchi, Linguistic Intelligence Research Group, Innovative Communication Laboratory

Click here for other research exhibits

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22

Can you trust these search results?

Hub text identification for cross-modal embeddings

Contact

Download