Determined Source Separation with Multichannel Variational Autoencoder

Papers

Hirokazu Kameoka, Li Li, Shota Inoue, and Shoji Makino, "Supervised determined source separation with multichannel variational autoencoder," Neural Computation, vol. 31, no. 9, pp. 1891-1914, Sep. 2019. (PDF)
Li Li, Hirokazu Kameoka, Shota Inoue, and Shoji Makino, "FastMVAE: A fast optimization algorithm for the multichannel variational autoencoder method," IEEE Access, vol. 8, pp. 228740-228753, Dec. 2020. (PDF)
Li Li, Hirokazu Kameoka, and Shoji Makino, "FastMVAE2: On improving and accelerating the fast variational autoencoder-based source separation algorithm for determined mixtures," arXiv:2109.13496, Sep. 2021.(PDF)

MVAE

The MVAE method [1] is a variational autoencoder (VAE)-based source separation algorithm for determined speech mixtures. The basic idea is to model and estimate the power spectrogram of each speech signal in a mixture using a conditional VAE (CVAE) conditioned on a speaker code. To the best of our knowledge, this was the first to incorporate the VAE concept into the multichannel source separation framework, although several similar attempts have been made independently by different research groups around the same time.

FastMVAE and FastMVAE2

The FastMVAE [2] and FastMVAE2 [3] methods are faster versions of the MVAE method. One drawback of the MVAE method is the computational cost of the backpropagation step in the separation-matrix estimation algorithm. To overcome this drawback, the FastMVAE method uses an auxiliary classifier VAE (ACVAE) to model the generative distribution of source spectrograms. By using ACVAE, the backpropagation step in the MVAE algorithm can be replaced by the forward propagation of the pretrained networks, thus significantly reducing the computational cost. The FastMVAE2 method further improves on the FastMVAE method by using a model called ChimeraACVAE to increase separation accuracy while maintaining computational efficiency.

Links to related pages

Please also refer to the following web sites.

Links to our other work

Audio examples

Here are audio examples of the input mixture signals, ground truth source signals, and separated signals obtained with the MVAE and FastMVAE2 methods and a conventional method called ILRMA [4].

Mixtures (Input)	Sources (Ground truth)	Separated signals
Mixtures of 2 sources recorded by 2 microphones
Mixtures (Input)	Sources (Ground truth)	ILRMA	MVAE	fMVAE2

Mixtures (Input)	Sources (Ground truth)	Separated signals
Mixtures of 3 sources recorded by 3 microphones
Mixtures (Input)	Sources (Ground truth)	ILRMA	MVAE	fMVAE2

Mixtures (Input)	Sources (Ground truth)	Separated signals
Mixtures of 6 sources recorded by 6 microphones
Mixtures (Input)	Sources (Ground truth)	ILRMA	MVAE	fMVAE2

Mixtures (Input)	Sources (Ground truth)	Separated signals
Mixtures of 9 sources recorded by 9 microphones
Mixtures (Input)	Sources (Ground truth)	ILRMA	MVAE	fMVAE2

Mixtures (Input)	Sources (Ground truth)	Separated signals
Mixtures of 18 sources recorded by 18 microphones
Mixtures (Input)	Sources (Ground truth)	ILRMA	MVAE	fMVAE2

References

[1] Hirokazu Kameoka, Li Li, Shota Inoue, and Shoji Makino, "Supervised determined source separation with multichannel variational autoencoder," Neural Computation, vol. 31, no. 9, pp. 1891-1914, Sep. 2019.

[2] Li Li, Hirokazu Kameoka, Shota Inoue, and Shoji Makino, "FastMVAE: A fast optimization algorithm for the multichannel variational autoencoder method," IEEE Access, vol. 8, pp. 228740-228753, Dec. 2020.

[3] Li Li, Hirokazu Kameoka, and Shoji Makino, "FastMVAE2: On improving and accelerating the fast variational autoencoder-based source separation algorithm for determined mixtures," arXiv:2109.13496, Sep. 2021.

[4] Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, and Hiroshi Saruwatari, "Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 9, pp. 1626-1641, Sep. 2016.

Audio examples of MVAE and FastMVAE2

Multichannel Variational Autoencoder Approach to
Determined Audio Source Separation

NTT Communication Science Laboratories, NTT Corporation

Papers

MVAE

FastMVAE and FastMVAE2

Links to related pages

Audio examples

References

Audio examples of MVAE and FastMVAE2

Multichannel Variational Autoencoder Approach toDetermined Audio Source Separation

NTT Communication Science Laboratories, NTT Corporation

Papers

MVAE

FastMVAE and FastMVAE2

Links to related pages

Audio examples

References

Multichannel Variational Autoencoder Approach to
Determined Audio Source Separation