Semi-supervised audio source separation with multichannel factorial hidden Markov model and source-filter model

Contributors: Takuya Higuchi (The University of Tokyo) and Hirokazu Kameoka (The University of Tokyo, NTT Corporation).

Examples of results in practical situations

We applly a source-filter model and integrate it into our previous model [1]. First, we model a source's spectrum as a form of a product of a source component and a filter component. Second, we extract filter components by using STRAIGHT [2] from learning data. Finally, we estimate the rest of the parameters of our generative model from observed signals, and simultaneously solve problems of source separation, audio event detection, dereverberation and DOA estimation. In this experiments, contents of utterances in the learning data were different from these of observed signals, but they were spoken by the same speakers. Moreover, we performed diffuse noise reduction in a unified manner. We hope we will present it at a next conference.

Examples of learning data
speaker 1

speaker 2

cell phone





[1]T. Higuchi and H. Kameoka, ``Unified approach for audio source separation with multichannel factorial HMM and DOA mixture model,'' EUSIPCO 2015. [to appear]
[2]H. Kawahara, I. Masuda-Katsuse and A. Cheveigne, ``Restructuring speech rep- resentations using a pitch-adaptive time-freqency smoothing and an instantaneous- frequency-based f0 extraction,'' Speech Communication, vol. 27, no. 3--4, pp. 187--207, 1999.