Demonstrations of Underdetermined Blind Separation of Speech Signals

last modified: Oct. 1, 2003.


Contents

1: Abstract
2: References
3: Results (simulated anechoic mixtures)
4: Results (echoic mixtures)
5: Discussions

1: Abstract


Fig. 1: Block diagram of underdetermined BSS.
We propose a method for separating speech signals with little distortion when the signals outnumber the sensors. Several methods have already been proposed for solving the underdetermined problem, and some of these utilize the sparseness of speech signals. These methods employ binary masks that extract a signal at time points where the number of active sources is estimated to be only one. However, these methods result in an unexpected excess of zero-padding and so the extracted speeches are severely distorted and have loud musical noise. In this paper, we propose combining a sparseness approach and independent component analysis (ICA). First, using sparseness, we estimate the time points when only one source is active. Then, we remove this single source from the observations and apply ICA to the remaining mixtures. Experimental results show that our proposed sparseness and ICA (SPICA) method can separate signals with little distortion even in a reverberant condition without any serious deterioration in the separation performance SIR.

back

2: References

back

3: Results (simulated anechoic mixtures)

We simulated an omni-directional microphone pair of an inter-element spacing of 4 cm giving some delay to the original speech signals.
The values of delay corresponded to the speech signals from three directions, 50 deg.(s1), 100 deg.(s2), and 135 deg. (s3).
The sampling rate was 8 kHz.
The original speech signals were selected from the ASJ continuous speech corpus.(All speech signals are in Japanese....Sorry...)

In the tables,

In the tables, the values show the separation performance, (SIR, SDR) in dB.
SIR: Signal to Interference Ratio, SDR: Signal to Distortion Ratio.


back

4: Results (echoic mixtures)


Fig. 2: Room for echoic tests.

For the echoic tests, we recorded each speech signal in a real room (see Fig. 2) whose reverberation time was 130ms and added them to obtain the mixtures.
We have tested the performance only for the female-male-male combination so far....


back

5: We can say that...

back