Speaking rhythm conversion by non-negative temporal decomposition

Speaking rhythm plays an important role in speech production and the perception of non-native languages. However, conventional techniques are insufficient to control speaking rhythm. In this study, we invented a novel method for extracting the speaking rhythm from speech signals using a non-negative temporal decomposition (NTD) and controlled the speaking rhythm using the method. This algorithm decomposes a speech spectrogram into a set of temporally overlapped phoneme-dependent event functions and corresponding event vectors under speech-specific restrictions. We found that the speaking rhythm can be converted by modifying the obtained event functions. We hope that this technique will alleviate the burden involved in communication with non-native languages.

Demonstration

This demonstration shows an example of the speaking rhythm of an English sentence "Rice is often served in round bowls." by a native Japanese speaker is converted into the rhythm of a native English speaker.

Reference

Hiroya, S., ``Non-negative temporal decomposition of speech parameters by multiplicative update rules,'' IEEE Transactions on Audio, Speech and Language Processing, vol. 21, no. 10, pp.2108-2117, 2013.
Hiroya, S., ``Speech signal processing for speaking rhythm extraction and control (in Japanese),'' NTT Technical Journal, vol. 25, no. 9, pp. 26-29, 2013.
Hiroya, S., ``Speaking rhythm extraction and control by non-negative temporal decomposition,'' NTT Technical Review, vol. 11, no. 12, 2013.

back