By speaking, we convey/understand our intentions/feelings. We also change the impression given to the other person by controlling the voice, including intonation, speaker characteristics, and rhythm. Unfortunately, the voice that can be generated by an individual is limited, and its controllability is also limited. In this talk, we will introduce the challenges of the conventional speech transformation technology and our approaches with the theme "What can be done when the voice is combined with deep learning, which has been developing remarkably in recent years?". Finally, we look at the future of deep learning and speech generation and conversion.
Media Information Laboratory