Paper

WaveCycleGAN: Synthetic-to-Natural Speech Waveform Conversion Using Cycle-Consistent Adversarial Networks

Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Hirokazu Kameoka
PDF

Japanese audio samples (reported in the paper)

Due to a licence, we don't have a permission to show audio samples.
(We are going to train models by using an alternative Japanese database which allows us to publish.)

English audio samples

Systems

Merlin

Conventional DNN-based TTS [1]

GANv

GAN-based postfilter over acoustic features [2]
(applied to Merlin's results)

Proposed

GAN-based postfilter over speech waveform
(applied to Merlin's results)

Bonus

Bounus tracks enhancing formant

Database

CMU Arctic Databases [3]

Training: 1000 sentences
Evaluation: 132 sentences

Male speaker: bdl
(Supported: Safari, Chrome, FireFox, Opera)
Natural Merlin GANv Proposed Bonus

Female speaker: slt

References

Zhizheng Wu, Oliver Watts, and Simon King, "Merlin: An Open Source Neural Network Speech Synthesis System," in Proc. 9th ISCA Speech Synthesis Workshop (SSW9), Sep. 2016.
web page

Takuhiro Kaneko, Hirokazu Kameoka, Nobukatsu Hojo, Yusuke Ijima, Kaoru Hiramatsu, and Kunio Kashino, "Generative Adversarial Network-based Postfilter For Statistical Speech Synthesis," in ICASSP, Mar. 2017.
web page

John Kominek and Alan W Black, "The CMU Arctic Speech Databases," in Proc. 5th ISCA Speech Synthesis Workshop (SSW5), June 2004.
web page