Media Intelligence

Extracting essential information from sounds

- Advances in distant speech recognition by deep learning -

Abstract

We are working on conversational speech recognition and communication scene analysis in real world sound environments. We have proposed various speech processing methods based on deep learning (DL), which is an essential technique for their realization. In addition to the speech recognition techniques in which DL has been widely employed, we are proposing a variety of DL-based speech processing methods, namely, speech enhancement and acoustic event detection techniques. These DL-based speech processing methods achieve excellent recognition performance for conversational speech. Our DL-based techniques expand the usability of a voice interface in real and noisy daily scenes.