Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Approach

Abstract

Considering that acoustic scenes and sound events are closely related to each other, their joint analysis in some previous studies has been explored utilizing multitask learning (MTL)-based neural networks. In conventional approaches, either strongly or weakly supervised schemes are applied to sound event detection in MTL models. However, annotating strong event labels is quite time-consuming or using only weak labels achieves low performance in sound event detection. In this paper, we thus propose a method for the joint analysis of acoustic scenes and sound events based on the MTL framework using both strong and weak labels of sound events. Our experimental results, which were obtained using subsets of the TUT Acoustic Scenes 2016/2017 and TUT Sound Events 2016/2017 datasets, show that the proposed method using both strong and weak labels achieves reasonable performance even with a small number of strongly labeled data and a large number of weakly labeled data.

Publication
In APSIPA