Acceleration and Stabilization of Training Procedure
We varied hyperparamters \(\lambda_\mathrm{ga}\), \(\lambda_\mathrm{cpx}\), and \(\lambda_\mathrm{cpy}\) of our objective function,
Source speech:
Target speech:
Conversion results after 1,000 epochs training
(Supported: Safari, Chrome, FireFox, Opera)
\(\lambda_\mathrm{ga}\)
\(\lambda_\mathrm{cpx}\)
\(\lambda_\mathrm{cpy}\)
Attention
Converted
--
--
--
1
--
--
1 (Failed)
--
--
10,000
10
10
Additional results
--
10
--
--
--
10
--
10
10
10,000
--
--
10,000
10
--
10,000
--
10
References
Tomoki Toda, Alan W. Black, and Keiichi Tokuda,
"Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory,"
in IEEE Transactions on ASLP, 2007.
web page
Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari,
"Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities,"
in Proc. INTERSPEECH, Aug. 2017.
web page
John Kominek and Alan W Black,
"The CMU Arctic Speech Databases,"
in Proc. 5th ISCA Speech Synthesis Workshop (SSW5), June 2004.
web page
Masanori Morise, Fumiya Yokomori, Kenji Ozawa,
"WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications,"
in IEICE, 2016.
web page
Daniel W. Griffin and Jae S. Lim,
"Signal Estimation from Modified Short-Time Fourier Transform,"
in IEEE Transactions on ASSP, 1984.
web page