Personalizing your speech recognizer

Neural network adaptation for automatic speech recognition

Abstract

Automatic speech recognition is used more and more often in our everyday life. However, the accuracy of speech recognizers largely varies depending on the speakers. In this exhibit, we present a system that can adapt to the speaker’s characteristics to maintain high recognition accuracy for all speakers. The proposed system first extracts the speaker’s voice characteristics and then uses them to adjust neural network parameters for optimal speech recognition accuracy. Since only a few seconds of speech data are sufficient for estimating the voice characteristics, the proposed system can adapt a large number of network parameters using very little speech data. In addition, this approach can potentially be extended to other types of acoustic variations, such as noise, to realize noise-robust speech recognition. Moreover, the same ideas could be applied to other AI problems, where we want to control the behavior of a network depending on the input characteristics.

Photos

Poster

Please click the thumbnail image to open the full-size PDF file.

Presenters

Marc Delcroix
Media Information Laboratory

Takuya Higuchi
Media Information Laboratory

Keisuke Kinoshita
Media Information Laboratory

Atsunori Ogawa
Media Information Laboratory

Shigeki Karita
Media Information Laboratory

Tomohiro Nakatani
Media Information Laboratory

Oral Presentations：Eisaku Maeda (Director's Talk) | Tomoharu Iwata | Takuhiro Kaneko | Makio Kashino | Takashi G. Sato |
Exhibition：1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29
Prev | Next