Underdetermined blind source separation for convolutive mixtures
We made recordings for the mixtures of three human speeches with two
microphones. Then, we applied a blind source separation (BSS)
technique to the mixtures. Since the number of microphones was less
than the number of speakers (underdetermined BSS), linear filtering
estimated with standard independent component analysis (ICA) is not
efficient for separating the mixtures. Instead, time-frequency (T-F)
masking is commonly employed for such an underdetermined case.
In this specific example, T-F masking was applied to each frequency
bin independently. Such frequency-independent separation processing
is effective for this kind of situation where the room reverberation
cannot be ignored and the microphones equipped with Roland R-09 have
some directivity. However, after the separation in each frequency bin
is completed, we need to group the frequency components that originate
from the same speaker together. This problem is called the
permutation problem. We have solved this problem by our newly
depeloped method where the activities of separated signals are
represented by sequences and those sequences are clustered for each
speaker.
Recorded sound
Separation results
- 1st speaker
- 2nd speaker
- 3rd speaker
Recording setup
There were three human speakers.
Simultaneous uttrances were recorded with Roland R-09, which has two
microphones.
Reference
- H. Sawada, S. Araki, S. Makino,
"Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment,"
IEEE Trans. Audio, Speech, and Language Processing, vol.19, no.3, pp.516-527, March 2011.
(Paper: PDF)