HOME / Exhibition Program / Listening to what you want!
Exhibition Program
Media Information Science
08

Listening to what you want!

Real-time selective listening of everyday sounds

Listening to what you want!
Abstract

Humans can selectively listen to a target sound even when many sounds overlap. This research brings that capability to computers by developing real-time target sound extraction that isolates desired audio from mixed signals on general-purpose PCs while maintaining high accuracy. By incorporating an audio foundation model with general sound representations developed at NTT, the method further improves extraction accuracy and sound quality. We also implement binaural processing to estimate the direction of arrival, making the system closer to human listening. Ultimately, the technology lets users flexibly hear or suppress sounds depending on the context, for example, by reducing household noise in remote-work meetings while preserving meaningful sounds during family calls, enabling more comfortable and effective communication.

Listening to what you want!
References

[1] M. Delcroix, J. B. Vázquez, T. Ochiai, K. Kinoshita, Y. Ohishi, S. Araki, “SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 31, pp.121-136, 2022..

[2] K. Wakayama, T. Ochiai, M. Delcroix, M. Yasuda, S. Saito, S. Araki, A. Nakayama, “Online target sound extraction with knowledge distillation from partially non-causal teacher,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 561-565, 2024.

[3] C. Hernandez-Olivan, M. Delcroix, T. Ochiai, D. Niizumi, N. Tawara, T. Nakatani, S. Araki, “SoundBeam meets M2D: Target sound extraction with audio foundation model,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025.

[4] C. Hernandez-Olivan, M. Delcroix, T. Ochiai, N. Tawara, T. Nakatani, S. Araki, “Interaural time difference loss for binaural target sound extraction,” in Proc. 18th International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 210-214), 2024. IEEE.

Poster
Contact

Marc Delcroix, Media Information Laboratory, Signal Processing Research Group

Click here for other research exhibits