RECENT ADVANCES IN DISTANT SPEECH RECOGNITION

  • Home
  • Ressources
  • References

RECENT ADVANCES IN DISTANT SPEECH RECOGNITION

    This is the webpage associated with the Interspeech 2016 tutorial titled Recent advances in distant speech recognition
    The slides used during the tutorial are available here.

    You can find links to ressources (tools and data sets) refered during the tutorial in the Ressources section, and a list of references used to prepare the tutorial the References section.

Abstract

    Automatic speech recognition (ASR) is being deployed successfully more and more in products such as voice search applications for mobile devices. However, it remains challenging to perform recognition when the speaker is distant from the microphone, because of the presence of noise, attenuation, and reverberation. Research on distant ASR has received increased attention, and has progressed rapidly due to the emergence of 1) deep neural network (DNN) based ASR systems, 2) the launch of recent challenges such as CHiME series, REVERB, ASpIRE, and DIRHA, and 3) the development of new products such as the Microsoft Kinect and the AMAZON Echo. This tutorial will review the recent progresses made in the field of distant speech recognition in the DNN era, including single and multi-channel speech enhancement front-ends, and acoustic modeling techniques for robust back-ends. The tutorial will also introduce practical schemes for building distant ASR systems based on the expertise acquired from past challenges.

Presenters

  • Marc Delcroix, NTT Communication Science Laboratories, Japan
    • Marc Delcroix received the M.Eng. degree from the Free University of Brussels, Brussels, Belgium, and the Ecole Centrale Paris, Paris, France, in 2003 and the Ph.D. degree from the Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan, in 2007. He joined NTT Communication Science Laboratories, Kyoto, Japan in 2010, where he is a senior research scientist. He is also a visiting lecturer at the Faculty of Science and Engineering of Waseda University, Tokyo, Japan. From 2004 to 2008, he was a research student and research associate at NTT Communication Science Laboratories, Kyoto, Japan. From 2008 to 2010, he worked at Pixela on software development for digital television. He is a senior member of the IEEE signal processing society and a member of the acoustic society of Japan (ASJ).
      His research interests include robust distant speech recognition, acoustic model adaptation, integration of speech enhancement front-end and recognition back-end, speech enhancement and speech dereverberation. He took an active part in the development of NTT robust speech recognition systems for the REVERB and the CHiME 1 and 3 challenges, that all achieved best performances on the tasks. He was one of the organizers of the REVERB challenge, 2014.
  • Shinji Watanabe, Mitsubishi Electric Research Laboratories, USA
    • Shinji Watanabe received the Dr. Eng. Degree in 2006 from Waseda University, Japan. From 2001 to 2011, He was working at NTT Communication Science Laboratories, Japan. From 2012, he has been working at Mitsubishi Electric Research Laboratories, USA. His research interests include machine learning, Bayesian inference, speech recognition, and spoken language processing. He is a member of the Acoustical Society of Japan (ASJ), the Institute of Electronics, Information and Communications Engineers (IEICE), and a senior member of IEEE. He is currently an Associate Editor of the IEEE Transactions on Audio Speech and Language Processing, and several committee members including the IEEE Signal Processing Society Speech and Language Technical Committee (IEEE SLTC). He is a tutorial speaker at ICASSP 2012 about "Bayesian Learning for Speech and Language Processing".
      Recently, he is working on noise robust speech recognition for distant-talk scenarios, actively. He participated in CHiME1, CHiME2 track2, REVERB, and CHiME3 challenges, and his team placed 1st, 1st, 2nd, and 2nd winners, respectively. He also organized CHiME2 and CHiME3 speech separation and recognition challenge, and led "Far-Field Speech Enhancement and Recognition in Mismatched Settings" research group at 2015 Jelinek Summer Workshop on Speech and Language Technology as a senior team member.