Demonstrations of Underdetermined Blind Separation of Speech Signals
last modified: Oct. 1, 2003.
Contents
1: Abstract
2: References
3: Results (simulated anechoic mixtures)
4: Results (echoic mixtures)
5: Discussions

Fig. 1: Block diagram of underdetermined BSS. |
We propose a method for separating speech signals with little distortion
when the signals outnumber the sensors.
Several methods have already been proposed
for solving the underdetermined problem,
and some of these utilize the sparseness of speech signals.
These methods employ binary masks that extract a signal at time points
where the number of active sources is estimated to be only one.
However, these methods result in an unexpected excess of zero-padding and
so the extracted speeches are severely distorted and
have loud musical noise.
In this paper, we propose combining a sparseness approach and
independent component analysis (ICA).
First, using sparseness,
we estimate the time points when only one source is active.
Then,
we remove this single source from the observations
and apply ICA to the remaining mixtures.
Experimental results show that our proposed sparseness and ICA (SPICA)
method can
separate signals with little distortion even in a reverberant condition
without any serious deterioration in the separation performance SIR.
back
- About the Methodology
-
S. Araki, S. Makino, A. Blin, R. Mukai and H. Sawada,``Blind Separation of More Speech than Sensors with Less Distortion by Combining Sparseness and ICA,'' IWAENC2003, pp.271--274, 2003. [pdf]
-
S. Araki, R. Mukai, H. Sawada and S. Makino, ``Blind Separation of more signals than sensors combining binary-masks and ICA,'' the 2003 Autumn Meeting of Acoustical Society of Japan, pp.587-588, 2003. (in Japanese. Japanese title).
- About the Sparseness Assessments
- A. Blin, S. Araki and S. Makino,``Blind Source Separation when Speech Signals Outnumber Sensors using a Sparseness-Mixing Matrix Estimation,'', IWAENC2003, pp. 211-214, 2003. [pdf]
- S. Araki, A. Blin, S. Makino, ``Blind Separation of More Speech Signals than Sensors using Time-frequency Masking and Mixing Matrix Estimation,'' the 2003 Autumn Meeting of Acoustical Society of Japan, pp.585-586, 2003.
back
We simulated an omni-directional microphone pair of
an inter-element spacing of 4 cm giving some delay to the original speech
signals.
The values of delay corresponded to the speech signals from three directions,
50 deg.(s1), 100 deg.(s2), and 135 deg. (s3).
The sampling rate was 8 kHz.
The original speech signals were selected
from the ASJ continuous speech corpus.(All speech signals are in Japanese....Sorry...)
In the tables,
- Sparse: with only sparseness,
- Case 1: with SPICA (s1 was removed in the 1st stage and s2 and s3 were separated in the 2nd stage),
- Case 2: with SPICA (s3 was removed in the 1st stage and s1 and s2 were separated in the 2nd stage).
In the tables,
the values show the separation performance, (SIR, SDR) in dB.
SIR: Signal to Interference Ratio, SDR: Signal to Distortion Ratio.
- female-male-male combination
- original speech:
s1,
s2,
s3
- mixtures:
mic1,
mic2
- separated signals:
|
y1 |
y2 |
y3 |
Sparse |
(17.6, 7.3)
y1 |
(11.6, 9.3)
y2 |
(17.3, 8.5)
y3 |
Case1 |
- |
(8.8, 12.5)
y2 |
(13.6, 16.2)
y3 |
Case2 |
(17.5, 20.8)
y1 |
(8.8, 11.8)
y2 |
- |
- male-male-male combination
- original speech:
s1,
s2,
s3
- mixtures:
mic1,
mic2
- separated signals:
|
y1 |
y2 |
y3 |
Sparse |
(13.1, 4.3)
y1 |
(7.7, 8.1)
y2 |
(15.6, 4.6)
y3 |
Case1 |
- |
(4.5, 9.6)
y2 |
(10.4, 10.0)
y3 |
Case2 |
(12.6, 17.7)
y1 |
(4.0, 13.5)
y2 |
- |
- female-female-female combination
- original speech:
s1,
s2,
s3
- mixtures:
mic1,
mic2
- separated signals:
|
y1 |
y2 |
y3 |
Sparse |
(23.6, 8.5)
y1 |
(11.3, 11.9)
y2 |
(18.2, 8.5)
y3 |
Case1 |
- |
(8.0, 15.3)
y2 |
(13.0, 15.6)
y3 |
Case2 |
(16.5, 21.5)
y1 |
(8.0, 14.2)
y2 |
- |
back

Fig. 2: Room for echoic tests. |
For the echoic tests,
we recorded each speech signal in a real room (see Fig. 2)
whose reverberation time was 130ms and added them to obtain the mixtures.
We have tested the performance
only for the female-male-male combination so far....
- Positions: 50 deg. (s1(female)) -
100 deg.(s2(male)) -
135 deg. (s3(male)).
- mixtures:
mic1,
mic2
- separated signals:
|
y1 |
y2 |
y3 |
Sparse |
(9.9, 4.0)
y1 |
(4.7, 8.3)
y2 |
(8.3, 4.8)
y3 |
Case1 |
- |
(4.1, 9.1)
y2 |
(9.6, 6.9)
y3 |
Case2 |
(9.0, 9.4)
y1 |
(3.3, 10.3)
y2 |
- |
- Positions: 45 deg. (s1(female)) -
90 deg.(s2(male)) -
135 deg. (s3(male)).
- mixtures:
mic1,
mic2
- separated signals:
|
y1 |
y2 |
y3 |
Sparse |
(12.1, 4.3)
y1 |
(5.8, 9.7)
y2 |
(13.8, 3.5)
y3 |
Case1 |
- |
(5.3, 10.9)
y2 |
(12.1, 8.4)
y3 |
Case2 |
(8.4, 7.6)
y1 |
(4.2, 11.5)
y2 |
- |
back
-
With sparseness only, the SIR values were high but the SDR values
were unsatisfactory.
-
In contrast, with SPICA,
we were able to obtain high SDR values
without any serious deterioration in the separation performance SIR.
-
Even in a reverberant environment,
we obtained reasonable results with our proposed SPICA.
-
It should be noted that
it is hard to separate the signal at the center position by both methods.
back