Musical noise is a typical problem with blind source separation using a time-frequency mask. In this paper, we report that fine-shift and overlap-add method reduces the musical noise without degrading the separation performance. The effectiveness was confirmed by results of a the listening test undertaken in a room with a reverberation time of 130 ms.
back
In anechoic simulations, we simulated an omni-directional microphone pair of
an inter-element spacing of 4 cm giving some delay to the original speech
signals.
The values of delay corresponded to the speech signals from three directions,
45 deg.(s1), 90 deg.(s2), and 135 deg. (s3).
The sampling rate was 8 kHz and the FFT frame size was 512.
We changed the frame shift rate from 256(=512/2) to 64(=512/8).
The original speech signals were selected
from the ASJ continuous speech corpus.(All speech signals are in Japanese....Sorry...)
In the tables,
the values show the separation performance, (SIR, SDR) in dB.
SIR: Signal to Interference Ratio, SDR: Signal to Distortion Ratio.
For the echoic tests,
we used speech data
convolved with impulse responses recorded in a real room (see Fig.2)
whose reverberation time was 130ms.
original | shift=256 (512/2) | shift=128 (512/4) | shift=64 (512/8) |
sample | (19.7, 7.9) sample | (21.0, 8.9) sample | (21.6, 9.4) sample |
sample | (15.0, 4.9) sample | (16.2, 5.5) sample | (16.6, 5.7) sample |
sample | (22.7, 8.5) sample | (24.3, 9.6) sample | (24.4, 9.9) sample |
original | shift=256 (512/2) | shift=128 (512/4) | shift=64 (512/8) |
sample | (11.8, 4.4) sample | (12.4, 5.3) sample | (12.8, 5.2) sample |
sample | (14.4, 3.1) sample | (16.0, 3.4) sample | (15.9, 3.6) sample |
sample | (17.0, 6.7) sample | (19.2, 7.3) sample | (19.8, 7.7) sample |
original | shift=256 (512/2) | shift=128 (512/4) | shift=64 (512/8) |
sample | (11.3, 8.3) sample | (11.7, 8.9) sample | (11.9, 9.0) sample |
sample | (12.1, 7.1) sample | (12.6, 7.5 ) sample | (12.9, 8.6) sample |
original | shift=256 (512/2) | shift=128 (512/4) | shift=64 (512/8) |
sample | (18.0, 3.2) sample | (19.3, 4.0) sample | (20.1, 4.6) sample |
sample | (16.4, 3.2) sample | (18.8, 4.0) sample | (20.0, 4.2) sample |