In this specific example, T-F masking was applied to each frequency bin independently. Such frequency-independent separation processing is effective for this kind of situation where the room reverberation cannot be ignored and the microphones equipped with Roland R-09 have some directivity. However, after the separation in each frequency bin is completed, we need to group the frequency components that originate from the same speaker together. This problem is called the permutation problem. We have solved this problem by our newly depeloped method where the activities of separated signals are represented by sequences and those sequences are clustered for each speaker.
Simultaneous uttrances were recorded with Roland R-09, which has two
microphones.