Human Science

Speak like a native

- Speech rhythm control by non-negative temporal decomposition -

Abstract

Speaking rhythm plays an important role in speech production and the perception of non-native languages. However, conventional techniques are insufficient to control speaking rhythm. In this study, we invented a novel method for extracting the speaking rhythm from speech signals using a non-negative temporal decomposition (NTD) and controlled the speaking rhythm using the method. This algorithm decomposes a speech spectrogram into a set of temporally overlapped phoneme-dependent event functions and corresponding event vectors under speech-specific restrictions. We found that the speaking rhythm can be converted by modifying the obtained event functions. We hope that this technique will alleviate the burden involved in communication with non-native languages.

Poster

Please click the thumbnail image to open the full-size PDF file.

Reference

S. Hiroya, “Non-negative temporal decomposition of speech parameters,” in Proc. ICASSP, 2010.
S. Hiroya, T. Kitamura, “Generation of a vocal-tract movie based on sparse sampling,” in Proc. ISSP, 2011.

Presentor

Sadao Hiroya
Human Information Science Laboratory

Takemi Mochida
Human Information Science Laboratory