Paper

WaveCycleGAN: Synthetic-to-Natural Speech Waveform Conversion Using Cycle-Consistent Adversarial Networks

Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Hirokazu Kameoka

 PDF

Japanese audio samples (reported in the paper)

Due to a licence, we don't have a permission to show audio samples.
(We are going to train models by using an alternative Japanese database which allows us to publish.)

English audio samples

Systems
  Merlin
Conventional DNN-based TTS [1]
  GANv
GAN-based postfilter over acoustic features [2]
(applied to Merlin's results)
  Proposed
GAN-based postfilter over speech waveform
(applied to Merlin's results)
  Bonus
Bounus tracks enhancing formant
Database
  CMU Arctic Databases [3]
Training: 1000 sentences
Evaluation: 132 sentences

Male speaker: bdl
(Supported: Safari, Chrome, FireFox, Opera)
Natural Merlin GANv Proposed Bonus

Female speaker: slt

References

  1. Zhizheng Wu, Oliver Watts, and Simon King, "Merlin: An Open Source Neural Network Speech Synthesis System," in Proc. 9th ISCA Speech Synthesis Workshop (SSW9), Sep. 2016.
     web page
  2. Takuhiro Kaneko, Hirokazu Kameoka, Nobukatsu Hojo, Yusuke Ijima, Kaoru Hiramatsu, and Kunio Kashino, "Generative Adversarial Network-based Postfilter For Statistical Speech Synthesis," in ICASSP, Mar. 2017.
     web page
  3. John Kominek and Alan W Black, "The CMU Arctic Speech Databases," in Proc. 5th ISCA Speech Synthesis Workshop (SSW5), June 2004.
     web page