Media Information Laboratory

Dr.Noboru Harada,
The Media Information Laboratory is organized into three research groups: media recognition, signal processing, and computing theory.
 Today, technological advances are occurring so rapidly that it is as if a technology should have already been realized by the moment someone imagines it. In the media information processing domain, the gap between fundamental and applied research is getting narrower and narrower.
 Under such circumstances, we not only pursue principled and theoretical approaches to address various issues, but try to learn as much as possible from findings and experiences in the real world as well. Our goal is to contribute to solving social problems and creating a prosperous society through our activities.


2021.1.28 Award

Rintaro Ikeshita has received the 49th Awaya Kiyoshi Science Promotion Award from the Acoustical Society of Japan.
Rintaro Ikeshita and Tomohiro Nakatani, "Multiplicative update algorithms for independent vector analysis," 2020 Autumn meeting of Acoustical Society of Japan, 1-1-13, 2020.

2021.1.21 Award

Onkar Krishna, Go Irie, Xiaomeng Wu, Takahito Kawanishi and Kunio Kashino has received a "Best Research Paper Award Honorable Mention" at the 26th Symposium on Sensing via Image Information.
Onkar Krishna, Go Irie, Xiaomeng Wu, Takahito Kawanishi and Kunio Kashino(2020). "Adaptive Spotting: 3D Point Cloud Object Search Based on Deep Reinforcement Learning," The 26th Symposium on Sensing via Image Information.

2020.11.30 Notice

Dr. Tomohiro Nakatani was elevated to IEEE Fellow, with the following citation:
*for contributions to far-field signal processing for speech enhancement and recognition*

2020.9.10 Award

Tsubasa Ochiai has received the 48th Awaya Kiyoshi Science Promotion Award from the Acoustical Society of Japan.
Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, and Tomohiro Nakatani, "Investigation of multimodal target speaker extraction utilizing audio-visual speaker clues," 2020 Spring meeting of Acoustical Society of Japan, 1-1-24, 2020.

2020.7.1 Notice

NTT team ranked the first place in automated audio captioning (Task6) of this year's Detection and Classification of Acoustic Scenes and Events (DCASE2020) competition!

2020.3.17 Award

Kou Tanaka has received the 47th Awaya Kiyoshi Science Promotion Award from the Acoustical Society of Japan.
Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko and Nobukatsu Hojo, "WaveCycleGAN2: Time-domain Neural Post-filter for SpeechWaveform Generation," 2019 Autumn meeting of Acoustical Society of Japan, 3-4-3, 2019.

2020.2.13 Notice

Dr. Tomohiro Nakatani and Dr. Hirokazu Kameoka have been named to the list of AI 2000 Most Influential Scholars in the world!

» AI 2000 Most Influential Scholars(AMiner)

» AI 2000 Speech Recognition Most Influential Scholars(AMiner)

2020.1.27 Notice

29 papers have been accepted to ICASSP 2020 (International Conference on Acoustics, Speech and Signal Processing).

Click here to see the list of accepted papers

  • C. Boeddeker, T. Nakatani, K. Kinoshita, and R. Haeb-Umbach, "Jointly Optimal Dereverberation and Beamforming," Lecture
  • M. Delcroix, T. Ochiai, K. Zmolikova, K. Kinoshita, N. Tawara, T. Nakatani, and S. Araki, "Improving Speaker Discrimination of Target Speech Extraction with Time-domain SpeakerBeam", Poster
  • S. Emura, H. Sawada, S. Araki, and N. Harada, "A Frequency-domain BSS Method based on L1 Norm, Unitary Constraint, and Cayley Transform," Lecture
  • M. Ihori, A. Takashima, and R. Masumura, "Large-Context Ponter-Generater Networks for Spoken-to-Written Style Conversion," Poster
  • R. Ikeshita, T. Nakatani, and S. Araki, "Overdetermined Independent Vector Analysis," Poster
  • K. Imoto, N. Tonami, Y. Koizumi, M. Yasuda, R. Yamanishi, and Y. Yamashita, "Sound Event Detection By Multitask Learning of Sound Events and Scenes with Soft Scene Labels," Poster
  • M. Kawanaka, Y. Koizumi, R. Miyazaki, and K. Yatabe, "Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-box Cost Function," Poster
  • K. Kinoshita, T. Ochiai, M. Delcroix, and T. Nakatani, "Improving Noise Robust Automatic Speech Recognition with Single-channel Time-domain Enhancement Network,” Poster
  • K. Kinoshita, M. Delcroix, S. Araki, and T. Nakatani, "Tackling Real Noisy Reverberant Meetings with All-neural Source Separation, Counting, and Diarization System," Poster
  • Y. Koizumi, K. Yatabe, M. Delcroix, Y. Masuyama, and D. Takeuchi, "Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention," Lecture
  • Y. Koizumi, M. Yasuda, S. Murata, S. Saito, H. Uematsu, and N. Harada, "SPIDERnet: Attention Network for One-shot Anomaly Detection in Sounds," Poster
  • T. Kondo, K. Fukushige, N. Takamune, D. Kitamura, H. Saruwatari, R. Ikeshita, and T. Nakatani, "Convergence-guaranteed Independent Positive Semidefinite Tensor Analysis based on Student's t Distribution", Poster
  • S. Kurihara, M. Fukui, S. Shimauchi, and N. Harada, "Objective Quality Estimation Using PESQ for Hands-free Terminals," Poster
  • R. Masumura, M. Ihori, A. Takashima, T. Moriya, A. Ando, and Y. Shinohara, "Sequence-level consistency training for semi-supervised end-to-end automatic speech recognition," Poster
  • Y. Masuyama, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, "Phase reconstruction based on recurrent phase unwrapping with deep neural networks," Poster
  • T. Moriya, H. Sato, T. Tanaka, T. Ashihara, R. Masumura, Y. Shinohara, "Distilling Attention Weights for CTC-based ASR Systems," Poster
  • T. Nakatani, R. Takahashi, T. Ochiai, K. Kinoshita, R. Ikeshita, M. Delcroix, and S. Araki, "DNN-supported Mask-based Convolutional Beamforming for Simultaneous Denoising, Dereverberation, and Source Separation", Lecture
  • H. Narimatsu and H. Kasai "Overlapped State Hidden Semi-Markov Model for Grouped Multiple Sequences," Lecture
  • T. von Neumann, K. Kinoshita, L. Drude, C. Boeddeker, M. Delcroix, T. Nakatani, and R. Haeb-Umbach, "End-to-end Training of Time Domain Audio Separation and Recognition," Poster
  • T. Ochiai, M. Delcroix, R. Ikeshita, K. Kinoshita, T. Nakatani, and S. Araki, "BEAM-TASNET: Time-domain Audio Separation Network Meets Frequency-domain Beamformer,"
  • Y. Ohishi, A. Kimura, T. Kawanishi, K. Kashino, D. Harwath, and J. Glass, "Trilingual Semantic Embeddings of Visually Grounded Speech with Self-attention Mechanisms," Lecture.
  • C. Schymura, T. Ochiai, M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, and D. Kolossa, “A Dynamic Stream Weight Backprop Kalman Filter for Audiovisual Speaker Tracking," Poster.
  • D. Takeuchi, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, "Real-Time Speech Enhancement using Equilibraited RNN," Poster
  • D. Takeuchi, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, "Invertible DNN-based Nonlinear Time-Frequency Transform for Speech Enhancement," Poster
  • N. Tawara, A. Ogawa, T. Iwata, M. Delcroix, and T. Ogawa, “Frame-level Phoneme-invariant Speaker Embedding for Text-independent Speaker Recognition on Extremely Short Utterances," Poster
  • N. Tawara, H. Kamiyama, S. Kobashikawa, and A. Ogawa, “Improving Speaker-attribute Estimation by Voting based on Speaker Cluster Information,” Poster
  • X. Wu, T. Kawanishi, and K. Kashino, "Reflectance-guided, Contrast-accumulated Histogram Equalization," Poster
  • M. Yasuda, Y. Koizumi, S. Saito, H. Uematsu, and K. Imoto, "Sound Event Localization based on Sound Intensity Vector Refined by DNN-based Denoising and Source Separation," Poster

2019.8.28 Award

Kou Tanaka has received IEICE ISS Young Researcher's Award in Speech Field from Speech Commitee of Acoustical Society of Japan.
Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko and Nobukatsu Hojo, "Sequence-to-Sequence Voice Conversion Using Context Preservation Mechanism," IEICE Technical Report, vol. 119, no. 188, SP2019-10, pp. 7-12, 2019.

