Providing a diverse range of appealing voices｜Exhibition Program｜NTT Communication Science Laboratories OPEN HOUSE 2026

Exhibition Program

Media Information Science

09	Providing a diverse range of appealing voices An idol voice dataset for research on speech generation AI

Abstract

Research data for speech generation AI are often biased toward acted speech produced by professional speakers, such as voice actors. As a first step toward enabling AI systems to generate speech that is personalized to individual users’ preferences and perceived as appealing, we construct the Japanese Idol Speech (JIS) corpus, a multi-speaker speech dataset featuring live idols as speakers with a wide range of vocal characteristics. JIS is the first large-scale idol speech dataset with properly established contractual arrangements for research use, comprising over 200 speakers and approximately 30 hours of audio. In addition to reading speech and everyday conversational utterances commonly found in existing speech datasets, JIS includes distinctive idol-specific speech styles, such as utterances simulating Instax photo session events. By learning from the voices of diverse individuals, we aim to develop an AI system that enables anyone to flexibly and effectively refine their own voice.

Providing a diverse range of appealing voices

References

[1] Y. Kondo, H. Kameoka, K. Tanaka, T. Kaneko, “JIS: A Speech Corpus of Japanese Idol Speakers with Various Speaking Styles,” in Proc. INTERSPEECH, pp. 4783-4787, 2025.

Poster

Please click the icon to open the full-size PDF file.

Contact

Yuto Kondo, Computational Modeling Research Group, Media Information Laboratory

Click here for other research exhibits

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22

Providing a diverse range of appealing voices

An idol voice dataset for research on speech generation AI

Contact

Download