Science of Media Information

Exhibition Program 21

Turn-taking matters in conversation recognition

Robust speech processing using speakers‘ activity estimation

Abstract

We present our speech processing technologies developed for conversational speech recognition. Specifically, our focus is on techniques for speaker activity estimation (estimation of each speaker’s talking periods), because they play an important role in conversational speech recognition. As shown here, we can enhance a target speech signal from a recorded conversational speech signal by controlling a speech enhancement process according to the estimated speaker activities. It is also possible to improve the speech recognition accuracy by introducing the speaker activity information, including turn-taking information, into the language model in a speech recognition system. Our newly-developed speaker activity estimation method, which is based on a probabilistic model of speaker spatial information, is also presented. With these technologies, we contribute to realizing a more natural voice interface for our daily speech communication.

Photos

Poster


Please click the thumbnail image to open the full-size PDF file.

Presenters

Araki Shoko
Araki Shoko
Media Information Laboratory
Nobutaka Ito
Nobutaka Ito
Media Information Laboratory
Atsunori Ogawa
Atsunori Ogawa
Media Information Laboratory
Keisuke Kinoshita
Keisuke Kinoshita
Media Information Laboratory
Takuya Higuchi
Takuya Higuchi
Media Information Laboratory
Tomohiro Nakatani
Tomohiro Nakatani
Media Information Laboratory