Demonstrations of Underdetermined BSS with an overlap-add method

last modified: Sept. 17, 2004.

4: Applied to a binary mask + ICA method.

1: Abstract

Musical noise is a typical problem with blind source separation using a time-frequency mask. In this paper, we report that fine-shift and overlap-add method reduces the musical noise without degrading the separation performance. The effectiveness was confirmed by results of a the listening test undertaken in a room with a reverberation time of 130 ms.

back

2: Experimental conditions

In anechoic simulations, we simulated an omni-directional microphone pair of an inter-element spacing of 4 cm giving some delay to the original speech signals.
The values of delay corresponded to the speech signals from three directions, 45 deg.(s1), 90 deg.(s2), and 135 deg. (s3).
The sampling rate was 8 kHz and the FFT frame size was 512.
We changed the frame shift rate from 256(=512/2) to 64(=512/8).
The original speech signals were selected from the ASJ continuous speech corpus.(All speech signals are in Japanese....Sorry...)

In the tables, the values show the separation performance, (SIR, SDR) in dB.
SIR: Signal to Interference Ratio, SDR: Signal to Distortion Ratio.

For the echoic tests, we used speech data convolved with impulse responses recorded in a real room (see Fig.2) whose reverberation time was 130ms.

Fig. 2: Room for echoic tests.

back

3: Applied to a binary mask method.

TR=0ms

original	shift=256 (512/2)	shift=128 (512/4)	shift=64 (512/8)
sample	(19.7, 7.9) sample	(21.0, 8.9) sample	(21.6, 9.4) sample
sample	(15.0, 4.9) sample	(16.2, 5.5) sample	(16.6, 5.7) sample
sample	(22.7, 8.5) sample	(24.3, 9.6) sample	(24.4, 9.9) sample

TR=130ms

original	shift=256 (512/2)	shift=128 (512/4)	shift=64 (512/8)
sample	(11.8, 4.4) sample	(12.4, 5.3) sample	(12.8, 5.2) sample
sample	(14.4, 3.1) sample	(16.0, 3.4) sample	(15.9, 3.6) sample
sample	(17.0, 6.7) sample	(19.2, 7.3) sample	(19.8, 7.7) sample

back

4: Applied to a binary mask + ICA method. [1]

TR=130ms

original	shift=256 (512/2)	shift=128 (512/4)	shift=64 (512/8)
sample	(11.3, 8.3) sample	(11.7, 8.9) sample	(11.9, 9.0) sample
sample	(12.1, 7.1) sample	(12.6, 7.5 ) sample	(12.9, 8.6) sample

back

5: Applied to Roman's method [2]

TR=130ms

original	shift=256 (512/2)	shift=128 (512/4)	shift=64 (512/8)
sample	(18.0, 3.2) sample	(19.3, 4.0) sample	(20.1, 4.6) sample
sample	(16.4, 3.2) sample	(18.8, 4.0) sample	(20.0, 4.2) sample