Science of Media Information

Exhibition Program 16

Pay attention to the speaker you want to listen to

Computational selective hearing based on deep learning


In conversations, when several people speak at the same time, people have the ability to focus on listening to a desired speaker (Selective hearing). However, current computers and voice assistant devices are not necessarily good at such hearing. We are pursuing research aimed at realizing computational selective hearing that will enable a computer to focus on listening to a target speaker and ignore the other speakers.
We use our recently developed context adaptive neural network and propose informing the neural network about the target speaker’s voice characteristics such that the network can extract only that target speaker’s voice. This technology will lead the way to a more natural voice assistant that can focus on listening to a target speaker in the same way that people do.


  • [1] K. Zmolikova, M. Delcroix, K. Kinoshita, T. Higuchi, A. Ogawa, T. Nakatani, “Speaker-aware neural network based beamformer for speaker extraction in speech mixtures,” in Proc. Interspeech, 2017.
    [2] M. Delcroix, K. Zmolikova, K. Kinoshita, A. Ogawa, T. Nakatani, “Single channel speaker extraction and recognition with Speaker Beam,” in Proc. of 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’18), 2018. .




Marc Delcroix
Marc Delcroix
Media Information Laboratory