Research Talk

Communication with desired voice

- Deep generative model opens the way to innovative speech transformation -

Kou Tanaka, Media Information Laboratory


By speaking, we convey/understand our intentions/feelings. We also change the impression given to the other person by controlling the voice, including intonation, speaker characteristics, and rhythm. Unfortunately, the voice that can be generated by an individual is limited, and its controllability is also limited. In this talk, we will introduce the challenges of the conventional speech transformation technology and our approaches with the theme "What can be done when the voice is combined with deep learning, which has been developing remarkably in recent years?". Finally, we look at the future of deep learning and speech generation and conversion.


