Crossmodal

ConceptBeam
Target speech extraction based on “concept” or semantic information.
The Places Japanese audio caption corpus
Japanese spoken captions for the Places205 image dataset