Generative Personal Assistance
with Audio and Visual Examples

Takuhiro Kaneko
NTT Communication Science Laboratories, NTT Corporation

Overview

We aim to develop a system that can give feedback or instructions to a person who wishes to better do something or do new things. Unlike the existing personal assistance methods based on manually defined rules, our goal is to develop a system with the following advantages: (1) individuality, meaning that the system output must be suitable for individual persons; (2) concreteness, so that the instructions are concrete enough for users to easily understand; and (3) automaticity, so that the system performs the above process automatically. To this end, we propose several kinds of learning-based (specifically, deep learning-based) approaches. We believe that these approaches will also lead to a generic media generation technique that will meet a variety of demands in the near future.

[Paper] [Abstract] [Poster]
[Paper (Japanese)] [Abstract (Japanese)] [Poster (Japanese)]


Publications

Generative Controller

Generative Adversarial Image Synthesis with Decision Tree Latent Controller
Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino
CVPR 2018
[Paper] [Project] [Poster]

Generative Attribute Controller with Conditional Filtered Generative Adversarial Networks
Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino
CVPR 2017
[Paper] [Supplemental] [Project]

Feedback Generation

Adaptive Visual Feedback Generation for Facial Expression Improvement with Multi-task Deep Neural Networks
Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino
ACMMM 2016
[Paper]

Realistic Speech Synthesis & Voice Conversion

VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed Langevin Dynamics New!
Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo
arXiv:2010.02977, Oct. 2020
[Paper] [Project]

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion New!
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
Interspeech 2020 (arXiv:2010.11672, Oct. 2020)
[Paper] [Slides] [Project]

Non-Parallel Voice Conversion with Augmented Classifier Star Generative Adversarial Networks New!
Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, and Nobukatsu Hojo
arXiv:2008.12604, Aug. 2020
[Paper]

Many-to-Many Voice Transformer Network New!
Hirokazu Kameoka, Wen-Chin Huang, Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Tomoki Toda
arXiv:2005.08445, May 2020
[Paper]

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
Interspeech 2019 (arXiv:1907.12279, July 2019)
[Paper] [Project] [Poster]

WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation
Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo
arXiv:1904.02892, Apr. 2019
[Paper] [Project]

CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
ICASSP 2019 (arXiv:1904.04631, Apr. 2019)
[Paper] [Project] [Poster]

AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms
Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo
ICASSP 2019 (arXiv:1811.04076, Nov. 2018)
[Paper] [Project]

ConvS2S-VC: Fully Convolutional Sequence-to-Sequence Voice Conversion
Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo
IEEE/ACM Trans. Audio Speech Lang. Process. (arXiv:1811.01609, Nov. 2018)
[Paper] [IEEE Xplore] [Project]

WaveCycleGAN: Synthetic-to-Natural Speech Waveform Conversion Using Cycle-Consistent Adversarial Networks
Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Hirokazu Kameoka
SLT 2018 (arXiv:1809.10288, Sept. 2018)
[Paper] [Project]

ACVAE-VC: Non-parallel Voice Conversion with Auxiliary Classifier Variational Autoencoder
(Alternative title: ACVAE-VC: Non-parallel Many-to-Many Voice Conversion with Auxiliary Classifier Variational Autoencoder)
Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo
IEEE/ACM Trans. Audio Speech Lang. Process. 27(9), Sept. 2019 (arXiv:1808.05092, Aug. 2018)
[Paper] [IEEE Xplore] [Project]

Automatic Speech Pronunciation Correction with Dynamic Frequency Warping-Based Spectral Conversion
Nobukatsu Hojo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko
EUSIPCO 2018
[Paper]

StarGAN-VC: Non-parallel Many-to-Many Voice Conversion with Star Generative Adversarial Networks
Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo
SLT 2018 (arXiv:1806.02169, June 2018)
[Paper] [Project]

Generative Adversarial Network-based Approach to Signal Reconstruction from Magnitude Spectrograms
Keisuke Oyamada, Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo, Hiroyasu Ando
EUSIPCO 2018 (arXiv:1804.02181, Apr. 2018)
[Paper]

CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks
(Alternative title: Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks)
Takuhiro Kaneko, Hirokazu Kameoka
EUSIPCO 2018 (arXiv:1804.02181, Nov. 2017)
[Paper] [Project]

Non-native Speech Conversion with Consistency-Aware Recursive Network and Generative Adversarial Network
Keisuke Oyamada, Hirokazu Kameoka, Takuhiro Kaneko, Hiroyasu Ando, Kaoru Hiramatsu, Kunio Kashino
APSIPA ASC 2017
[Paper]

Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks
Takuhiro Kaneko, Hirokazu Kameoka, Kaoru Hiramatsu, Kunio Kashino
Interspeech 2017
[Paper]

Generative Adversarial Network-based Postfilter for STFT Spectrograms
Takuhiro Kaneko, Shiji Takaki, Hirokazu Kameoka, Junichi Yamagishi
Interspeech 2017
[Paper] [Project]

Generative Adversarial Network-based Postfilter for Statistical Parametric Speech Synthesis
Takuhiro Kaneko, Hirokazu Kameoka, Nobukatsu Hojo, Yusuke Ijima, Kaoru Hiramatsu, Kunio Kashino
ICASSP 2017
[Paper]

Crossmodal

Crossmodal Voice Conversion
Hirokazu Kameoka, Kou Tanaka, Aaron Valero Puche, Yasunori Ohishi, Takuhiro Kaneko
arXiv:1904.04540, Apr. 2019
[Paper] [Project]


Review Paper

[Invited Review] Generative Adversarial Networks: Foundations and Applications
Takuhiro Kaneko
Acoustical Science and Technology 39(3), May 2018
[Paper] [Paper (Japanese)]

Generative Personal Assistance with Audio and Visual Examples
Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino
NTT Technical Review 15(11), Nov. 2017
[Paper] [Paper (Japanese)]


Talks & Exhibitions

[Tutorial] Foundations, Advances, and Applications of Generative Adversarial Networks New!
Takuhiro Kaneko
JSAI 2020 (in Japanese)
[Program (Japanese)] [Slides (Japanese)]

[Invited Talk] Generative Adversarial Image Synthesis with Decision Tree Latent Controller (CVPR 2018)
Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino
FIT 2019 (in Japanese)
[Program (Japanese)]

[Tutorial] Foundations, Advances, and Applications of Generative Adversarial Networks
Takuhiro Kaneko
MIRU 2019 (in Japanese)
[Abstract (Japanese)] [Slides (Japanese)]

[Invited Talk] Foundations, Advances, and Applications of Generative Adversarial Networks: From Image Generation to Speech Synthesis and Voice Conversion
Takuhiro Kaneko
75th JSAI Seminar (in Japanese)
[Abstract (Japanese)]

[Invited Talk] Generative Adversarial Networks: Foundations and Applications
Takuhiro Kaneko
JAMIT 2018 (in Japanese)
[Abstract (Japanese)]

Creating Favorite Images with Selective Decisions: Hierarchical Image Analysis and Synthesis with DTLC-GAN
Takuhiro Kaneko
NTT Communication Science Laboratories Open House 2018
[Poster] [Poster (Japanese)]

Free-Feature-Point Image Generation: Interactive and Flexible Image Generation with Deep Learning
Takuhiro Kaneko
NTT R&D Forum 2018
[Poster] [Poster (Japanese)]

[Invited Talk] Generative Attribute Controller with Conditional Filtered Generative Adversarial Networks (CVPR 2017)
Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino
MIRU 2017 (in Japanese)
[Program (Japanese)]

Generative Personal Assistance with Audio and Visual Examples: Deep Learning Opens the Way to Innovative Media Generation
Takuhiro Kaneko
NTT Communication Science Laboratories Open House 2017
[Paper] [Abstract] [Poster]
[Paper (Japanese)] [Abstract (Japanese)] [Poster (Japanese)]


Contact

Takuhiro Kaneko
NTT Communication Science Laboratories, NTT Corporation
takuhiro.kaneko.tb at hco.ntt.co.jp