Research Talk｜Lecture｜NTT Communication Science Laboratories OPEN HOUSE 2025

Research Talk

AI that learns to listen on its own
Advancing self-supervised audio representation toward cutting-edge sound understanding with large language models
Daisuke Niizumi
Computational Modeling Research Group, Media Information Laboratory

Abstract

AI has improved its ability to understand media such as audio and images by learning to automatically extract useful features from data—a process known as representation learning. This talk introduces audio representation learning technologies that enable AI to interpret the diverse sounds in our environment. These learned representations help AI recognize different types of sounds, such as human voices and animal calls, or classify music genres. Recently, technologies have progressed to utilize self-supervised learning methods that learn effective features from the natural patterns in data rather than relying on the given information about data (e.g., labels) as in traditional methods. With the help of large language models, these technologies are further evolving to enable AI with a linguistic understanding of sounds.

Speaker

Daisuke Niizumi
Computational Modeling Research Group, Media Information Laboratory

AI that learns to listen on its own
Advancing self-supervised audio representation toward cutting-edge sound understanding with large language models
Daisuke Niizumi
Computational Modeling Research Group, Media Information Laboratory

Contact

Download

AI that learns to listen on its own Advancing self-supervised audio representation toward cutting-edge sound understanding with large language models Daisuke Niizumi Computational Modeling Research Group, Media Information Laboratory

Contact

Download

AI that learns to listen on its own
Advancing self-supervised audio representation toward cutting-edge sound understanding with large language models
Daisuke Niizumi
Computational Modeling Research Group, Media Information Laboratory