Attention | Female speech | Male speech | |
Recorded speech |
Noisy and reverberant speech | ![]() |
![]() |
WPE | Dereverberated speech (Noise is not reduced) |
![]() |
![]() |
Diffusion model |
Denoised and dereverberated speech | ![]() |
![]() |
WPE+ Diffusion model (proposed method) |
More accurately denoised and dereverberated speech | ![]() |
![]() |
[1] | Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Timo Gerkmann, "Speech Enhancement and Dereverberation with Diffusion-Based Generative Models", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2351-2364, 2023. |
[2] | Rino Kimura, Tomohiro Nakatani, Naoyuki Kamo, Marc Delcroix, Shoko Araki, Tetsuya Ueda, and Shoji Makino, "Diffusion model-based MIMO speech denoising and dereverberation," in Proc. Hands-free Speech Communication and Microphone Arrays (HSCMA), 2024. |
[3] | Tomohiro Nakatani, Takuya Yoshioka, Keisuke Kinoshita, Masato Miyoshi, "Speech dereverberation based on variance-normalized delayed linear prediction," IEEE Trans. Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1717-1731, August 2010 |
[4] | Takuya Yoshioka and Tomohiro Nakatani, "Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening," IEEE Tran. Audio, Speech, and Language Processing, vol. 20, no. 10, pp. 2707–2720, 2012. |
[5] | Naoyuki Kamo, Marc Delcroix, and Tomohiro Nakatani, "Target speech extraction with conditional diffusion model," in Proc. Interspeech, pp. 176–180, 2023. |