“Huh? What do you mean?” Summarize a long story short｜Exhibition Program｜NTT Communication Science Laboratories OPEN HOUSE 2022

Exhibition Program

Science of Media Information

14	“Huh? What do you mean?” Summarize a long story short Robust speech summarization against speech recognition errors

Abstract

Speech summarization aims at creating a summary from a long talk. It is an essential technology if we realize AI systems that can correctly understand human speech. One way to realize speech summarization is cascading automatic speech recognition (ASR) and text summarization. One issue of such approaches is that it is difficult to avoid ASR errors, which degrade the performance of summarization. To alleviate this problem, we propose a robust speech summarization against ASR errors. Our proposed system considers multiple ASR results and looks at the context and relationship between words to generate an accurate summary, even if each ASR result contains errors. The idea we proposed is general and can also be applied to other tasks such as speech translation. This research brings us one step closer to realizing machines that can deeply understand humans, by not only transcribing speech word-by-word but also accessing its meaning and intent.

“Huh? What do you mean?” Summarize a long story short

References

[1] T. Kano, A. Ogawa, M. Delcroix, S. Watanabe, “Attention-based multi-hypothesis fusion for speech summarization,” in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 487–494, 2021.

[2] T. Kano, A. Ogawa, M. Delcroix, S. Watanabe, “ASR hypothesis fusion using BERT for speech summarization,” in Proc. The 2022 Spring Meeting of the Acoustical Society of Japan (ASJ), 2022.

[3] T. Kano, A. Ogawa, M. Delcroix, S. Watanabe, “Integrating multiple ASR systems into NLP backend with attention fusion,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022.

Poster

Please click the icon to open the full-size PDF file.

Contact

Takatomo Kano / Signal Processing Research Group, Media Information Laboratory
Email: cs-openhouse-ml@hco.ntt.co.jp

Click here for other research exhibits

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

“Huh? What do you mean?” Summarize a long story short

Robust speech summarization against speech recognition errors

Contact

Download