Home

Underdetermined blind source separation for convolutive mixtures

We made recordings for the mixtures of three human speeches with two microphones. Then, we applied a blind source separation (BSS) technique to the mixtures. Since the number of microphones was less than the number of speakers (underdetermined BSS), linear filtering estimated with standard independent component analysis (ICA) is not efficient for separating the mixtures. Instead, time-frequency (T-F) masking is commonly employed for such an underdetermined case.

In this specific example, T-F masking was applied to each frequency bin independently. Such frequency-independent separation processing is effective for this kind of situation where the room reverberation cannot be ignored and the microphones equipped with Roland R-09 have some directivity. However, after the separation in each frequency bin is completed, we need to group the frequency components that originate from the same speaker together. This problem is called the permutation problem. We have solved this problem by our newly depeloped method where the activities of separated signals are represented by sequences and those sequences are clustered for each speaker.

Recorded sound

Original, 44.1kHz, 11 seconds, Stereo
Down-sampled to 16kHz, Truncated to 9 seconds, Stereo
This was used as the input to the separation system.

Separation results

Recording setup

There were three human speakers.

Simultaneous uttrances were recorded with Roland R-09, which has two microphones.

Reference

H. Sawada, S. Araki, S. Makino, "Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment," IEEE Trans. Audio, Speech, and Language Processing, vol.19, no.3, pp.516-527, March 2011. (Paper: PDF)