Science of Media Information

Intonation morphing from one speaker to another

- Prosody conversion with generative voice F0 contour model -

Abstract

A voice F0 contour, one of the essential components of human speech prosody, is generated by the vocal fold, whose tension is controlled by the thyroid cartilage movement. This mechanism is known to be well described by the Fujisaki model, a physical model consisting of a set of intonation (shift) and accent (rotation) parameters, but accurate estimation of those parameters has been considered difficult for a long time. Our proposed method "SPACE" makes it possible to accurately estimate the movements of the thyroid cartilage from speech, taking advantages of our newly established statistical framework with a specially designed HMM. Here we demonstrate intonation morphing, where you can speak with your own voice, but borrowing another person’s intonation and accent. The potential applications of the system range from naturalness-guaranteed text-to-speech systems or voice conversion systems, to self-training systems for improving presentation/language skills.

Photos

Poster

Please click the thumbnail image to open the full-size PDF file.

Presenters

Hirokazu Kameoka
Media Information Laboratory

Takuhiro Kaneko
Media Information Laboratory

Kunio Kashino
Media Information Laboratory

Aki Hayashi
Service Evolution Laboratories