WaveCycleGAN: Synthetic-to-Natural Speech Waveform Conversion Using Cycle-Consistent Adversarial Networks

Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Hirokazu Kameoka


Japanese audio samples (reported in the paper)

Due to a licence, we don't have a permission to show audio samples.
(We are going to train models by using an alternative Japanese database which allows us to publish.)

English audio samples

Conventional DNN-based TTS [1]
GAN-based postfilter over acoustic features [2]
(applied to Merlin's results)
GAN-based postfilter over speech waveform
(applied to Merlin's results)
Bounus tracks enhancing formant
  CMU Arctic Databases [3]
Training: 1000 sentences
Evaluation: 132 sentences

Male speaker: bdl
(Supported: Safari, Chrome, FireFox, Opera)
Natural Merlin GANv Proposed Bonus

Female speaker: slt


  1. Zhizheng Wu, Oliver Watts, and Simon King, "Merlin: An Open Source Neural Network Speech Synthesis System," in Proc. 9th ISCA Speech Synthesis Workshop (SSW9), Sep. 2016.
     web page
  2. Takuhiro Kaneko, Hirokazu Kameoka, Nobukatsu Hojo, Yusuke Ijima, Kaoru Hiramatsu, and Kunio Kashino, "Generative Adversarial Network-based Postfilter For Statistical Speech Synthesis," in ICASSP, Mar. 2017.
     web page
  3. John Kominek and Alan W Black, "The CMU Arctic Speech Databases," in Proc. 5th ISCA Speech Synthesis Workshop (SSW5), June 2004.
     web page