Methods | Description | SI-SDR (dB) | fwsSNR (dB) | ESTOI | PESQ | RTF |
Observation | No SE applied | -3.6 | 4.6 | 0.47 | 2.32 |
detNN | A deterministic NN-based SE, trained to map distorted speech to clean speech | 6.3 | 11.4 | 0.83 | 2.32 | 0.021 |
mSGMSE+Ensemble | A diffusion model-based SE, multi-stream extension of Score-based Generative Model for SE, being integrated with detNN and using ensemble inference | 8.1 | 11.7 | 0.86 | 2.58 | 7.87 |
PDRE (proposed) | SE based on Probabilistic-Deterministic Recursive Enhancement | 8.4 | 12.7 | 0.87 | 2.56 | 0.077 |
Clean | Clean speech reference containing direct signal and early reflections within 2 ms after the direct signal | | | | | |