HOME / Lecture / Research Talk
Research Talk

AI that learns to listen on its own
Advancing self-supervised audio representation toward cutting-edge sound understanding with large language models
Daisuke Niizumi
Computational Modeling Research Group, Media Information Laboratory

Abstract

AI has improved its ability to understand media such as audio and images by learning to automatically extract useful features from data—a process known as representation learning. This talk introduces audio representation learning technologies that enable AI to interpret the diverse sounds in our environment. These learned representations help AI recognize different types of sounds, such as human voices and animal calls, or classify music genres. Recently, technologies have progressed to utilize self-supervised learning methods that learn effective features from the natural patterns in data rather than relying on the given information about data (e.g., labels) as in traditional methods. With the help of large language models, these technologies are further evolving to enable AI with a linguistic understanding of sounds.

Speaker
Daisuke Niizumi
Daisuke Niizumi
Computational Modeling Research Group, Media Information Laboratory