Sound demo for joint denoising and dereverberation

This page contains speech signals sampled from the experiments presented in the following paper.
  1. T. Nakatani, N. Kamo, D. Marc, S. Araki, "Multi-stream diffusion model for probabilistic integration of model-based and data-driven speech enhancement," in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 65-69, September 2024.
The paper proposes a multi-stream diffusion model that integrates a diffusion model-based speech enhancement method, mSGMSE [1,2], with a blind dereverberation technique, Weighted Prediction Error (WPE) [3,4]. For further details, please refer to the paper.

Speech signals for comparison (please use headphones for optimal listening)

    Attention Female speech Male speech
    Recorded
    speech
    Noisy and reverberant speech
    WPE Dereverberated speech
    (Noise is not reduced)
    Diffusion
    model
    Denoised and dereverberated speech
    WPE+
    Diffusion
    model
    (proposed
    method)
    More accurately denoised and dereverberated speech

Other related work

[1] Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Timo Gerkmann, "Speech Enhancement and Dereverberation with Diffusion-Based Generative Models", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2351-2364, 2023.
[2] Rino Kimura, Tomohiro Nakatani, Naoyuki Kamo, Marc Delcroix, Shoko Araki, Tetsuya Ueda, and Shoji Makino, "Diffusion model-based MIMO speech denoising and dereverberation," in Proc. Hands-free Speech Communication and Microphone Arrays (HSCMA), 2024.
[3] Tomohiro Nakatani, Takuya Yoshioka, Keisuke Kinoshita, Masato Miyoshi, "Speech dereverberation based on variance-normalized delayed linear prediction," IEEE Trans. Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1717-1731, August 2010
[4] Takuya Yoshioka and Tomohiro Nakatani, "Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening," IEEE Tran. Audio, Speech, and Language Processing, vol. 20, no. 10, pp. 2707–2720, 2012.
[5] Naoyuki Kamo, Marc Delcroix, and Tomohiro Nakatani, "Target speech extraction with conditional diffusion model," in Proc. Interspeech, pp. 176–180, 2023.