Listening to what you want!｜Exhibition Program｜NTT Communication Science Laboratories OPEN HOUSE 2026

Exhibition Program

Media Information Science

08	Listening to what you want! Real-time selective listening of everyday sounds

Abstract

Humans can selectively listen to a target sound even when many sounds overlap. This research brings that capability to computers by developing real-time target sound extraction that isolates desired audio from mixed signals on general-purpose PCs while maintaining high accuracy. By incorporating an audio foundation model with general sound representations developed at NTT, the method further improves extraction accuracy and sound quality. We also implement binaural processing to estimate the direction of arrival, making the system closer to human listening. Ultimately, the technology lets users flexibly hear or suppress sounds depending on the context, for example, by reducing household noise in remote-work meetings while preserving meaningful sounds during family calls, enabling more comfortable and effective communication.

References

[1] M. Delcroix, J. B. Vázquez, T. Ochiai, K. Kinoshita, Y. Ohishi, S. Araki, “SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 31, pp.121-136, 2022..

[2] K. Wakayama, T. Ochiai, M. Delcroix, M. Yasuda, S. Saito, S. Araki, A. Nakayama, “Online target sound extraction with knowledge distillation from partially non-causal teacher,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 561-565, 2024.

[3] C. Hernandez-Olivan, M. Delcroix, T. Ochiai, D. Niizumi, N. Tawara, T. Nakatani, S. Araki, “SoundBeam meets M2D: Target sound extraction with audio foundation model,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025.

[4] C. Hernandez-Olivan, M. Delcroix, T. Ochiai, N. Tawara, T. Nakatani, S. Araki, “Interaural time difference loss for binaural target sound extraction,” in Proc. 18th International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 210-214), 2024. IEEE.

Poster

Please click the icon to open the full-size PDF file.

Contact

Marc Delcroix, Media Information Laboratory, Signal Processing Research Group

Click here for other research exhibits

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22

Listening to what you want!

Real-time selective listening of everyday sounds

Contact

Download