Media Information Laboratory

photo of Dr.Noboru Harada — Dr.Noboru Harada,
Executive Manager

The Media Information Laboratory is organized into three research groups: media recognition, signal processing, and computing theory.
Today, technological advances are occurring so rapidly that it is as if a technology should have already been realized by the moment someone imagines it. In the media information processing domain, the gap between fundamental and applied research is getting narrower and narrower.
Under such circumstances, we not only pursue principled and theoretical approaches to address various issues, but try to learn as much as possible from findings and experiences in the real world as well. Our goal is to contribute to solving social problems and creating a prosperous society through our activities.

News

2020.7.1 Notice

NTT team ranked the first place in automated audio captioning (Task6) of this year's Detection and Classification of Acoustic Scenes and Events (DCASE2020) competition!
http://dcase.community/challenge2020/task-automatic-audio-captioning-results

2020.2.13 Notice

Dr. Tomohiro Nakatani and Dr. Hirokazu Kameoka have been named to the list of AI 2000 Most Influential Scholars in the world!

» AI 2000 Most Influential Scholars(AMiner)

» AI 2000 Speech Recognition Most Influential Scholars(AMiner)

2020.1.27 Notice

29 papers have been accepted to ICASSP 2020 (International Conference on Acoustics, Speech and Signal Processing).

Click here to see the list of accepted papers

C. Boeddeker, T. Nakatani, K. Kinoshita, and R. Haeb-Umbach, "Jointly Optimal Dereverberation and Beamforming," Lecture
M. Delcroix, T. Ochiai, K. Zmolikova, K. Kinoshita, N. Tawara, T. Nakatani, and S. Araki, "Improving Speaker Discrimination of Target Speech Extraction with Time-domain SpeakerBeam", Poster
S. Emura, H. Sawada, S. Araki, and N. Harada, "A Frequency-domain BSS Method based on L1 Norm, Unitary Constraint, and Cayley Transform," Lecture
M. Ihori, A. Takashima, and R. Masumura, "Large-Context Ponter-Generater Networks for Spoken-to-Written Style Conversion," Poster
R. Ikeshita, T. Nakatani, and S. Araki, "Overdetermined Independent Vector Analysis," Poster
K. Imoto, N. Tonami, Y. Koizumi, M. Yasuda, R. Yamanishi, and Y. Yamashita, "Sound Event Detection By Multitask Learning of Sound Events and Scenes with Soft Scene Labels," Poster
M. Kawanaka, Y. Koizumi, R. Miyazaki, and K. Yatabe, "Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-box Cost Function," Poster
K. Kinoshita, T. Ochiai, M. Delcroix, and T. Nakatani, "Improving Noise Robust Automatic Speech Recognition with Single-channel Time-domain Enhancement Network,” Poster
K. Kinoshita, M. Delcroix, S. Araki, and T. Nakatani, "Tackling Real Noisy Reverberant Meetings with All-neural Source Separation, Counting, and Diarization System," Poster
Y. Koizumi, K. Yatabe, M. Delcroix, Y. Masuyama, and D. Takeuchi, "Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention," Lecture
Y. Koizumi, M. Yasuda, S. Murata, S. Saito, H. Uematsu, and N. Harada, "SPIDERnet: Attention Network for One-shot Anomaly Detection in Sounds," Poster
T. Kondo, K. Fukushige, N. Takamune, D. Kitamura, H. Saruwatari, R. Ikeshita, and T. Nakatani, "Convergence-guaranteed Independent Positive Semidefinite Tensor Analysis based on Student's t Distribution", Poster
S. Kurihara, M. Fukui, S. Shimauchi, and N. Harada, "Objective Quality Estimation Using PESQ for Hands-free Terminals," Poster
R. Masumura, M. Ihori, A. Takashima, T. Moriya, A. Ando, and Y. Shinohara, "Sequence-level consistency training for semi-supervised end-to-end automatic speech recognition," Poster
Y. Masuyama, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, "Phase reconstruction based on recurrent phase unwrapping with deep neural networks," Poster
T. Moriya, H. Sato, T. Tanaka, T. Ashihara, R. Masumura, Y. Shinohara, "Distilling Attention Weights for CTC-based ASR Systems," Poster
T. Nakatani, R. Takahashi, T. Ochiai, K. Kinoshita, R. Ikeshita, M. Delcroix, and S. Araki, "DNN-supported Mask-based Convolutional Beamforming for Simultaneous Denoising, Dereverberation, and Source Separation", Lecture
H. Narimatsu and H. Kasai "Overlapped State Hidden Semi-Markov Model for Grouped Multiple Sequences," Lecture
T. von Neumann, K. Kinoshita, L. Drude, C. Boeddeker, M. Delcroix, T. Nakatani, and R. Haeb-Umbach, "End-to-end Training of Time Domain Audio Separation and Recognition," Poster
T. Ochiai, M. Delcroix, R. Ikeshita, K. Kinoshita, T. Nakatani, and S. Araki, "BEAM-TASNET: Time-domain Audio Separation Network Meets Frequency-domain Beamformer,"
Y. Ohishi, A. Kimura, T. Kawanishi, K. Kashino, D. Harwath, and J. Glass, "Trilingual Semantic Embeddings of Visually Grounded Speech with Self-attention Mechanisms," Lecture.
C. Schymura, T. Ochiai, M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, and D. Kolossa, “A Dynamic Stream Weight Backprop Kalman Filter for Audiovisual Speaker Tracking," Poster.
D. Takeuchi, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, "Real-Time Speech Enhancement using Equilibraited RNN," Poster
D. Takeuchi, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, "Invertible DNN-based Nonlinear Time-Frequency Transform for Speech Enhancement," Poster
N. Tawara, A. Ogawa, T. Iwata, M. Delcroix, and T. Ogawa, “Frame-level Phoneme-invariant Speaker Embedding for Text-independent Speaker Recognition on Extremely Short Utterances," Poster
N. Tawara, H. Kamiyama, S. Kobashikawa, and A. Ogawa, “Improving Speaker-attribute Estimation by Voting based on Speaker Cluster Information,” Poster
X. Wu, T. Kawanishi, and K. Kashino, "Reflectance-guided, Contrast-accumulated Histogram Equalization," Poster
M. Yasuda, Y. Koizumi, S. Saito, H. Uematsu, and K. Imoto, "Sound Event Localization based on Sound Intensity Vector Refined by DNN-based Denoising and Source Separation," Poster
G. Zhang, K. Niwa and W.B. Kleijn, "PROJECTED WEIGHT REGULARIZATION TO IMPROVE NEURAL NETWORK GENERALIZATION," Poster