Visually-grounded speech

ConceptBeam
Target speech extraction based on “concept” or semantic information.