Media Information Laboratory
Media Information Laboratory
2020.7.1 NoticeNTT team ranked the first place in automated audio captioning (Task6) of this year's Detection and Classification of Acoustic Scenes and Events (DCASE2020) competition!
2020.2.13 NoticeDr. Tomohiro Nakatani and Dr. Hirokazu Kameoka have been named to the list of AI 2000 Most Influential Scholars in the world!
2020.1.27 Notice29 papers have been accepted to ICASSP 2020 (International Conference on Acoustics, Speech and Signal Processing).
Click here to see the list of accepted papers
- C. Boeddeker, T. Nakatani, K. Kinoshita, and R. Haeb-Umbach, "Jointly Optimal Dereverberation and Beamforming," Lecture
- M. Delcroix, T. Ochiai, K. Zmolikova, K. Kinoshita, N. Tawara, T. Nakatani, and S. Araki, "Improving Speaker Discrimination of Target Speech Extraction with Time-domain SpeakerBeam", Poster
- S. Emura, H. Sawada, S. Araki, and N. Harada, "A Frequency-domain BSS Method based on L1 Norm, Unitary Constraint, and Cayley Transform," Lecture
- M. Ihori, A. Takashima, and R. Masumura, "Large-Context Ponter-Generater Networks for Spoken-to-Written Style Conversion," Poster
- R. Ikeshita, T. Nakatani, and S. Araki, "Overdetermined Independent Vector Analysis," Poster
- K. Imoto, N. Tonami, Y. Koizumi, M. Yasuda, R. Yamanishi, and Y. Yamashita, "Sound Event Detection By Multitask Learning of Sound Events and Scenes with Soft Scene Labels," Poster
- M. Kawanaka, Y. Koizumi, R. Miyazaki, and K. Yatabe, "Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-box Cost Function," Poster
- K. Kinoshita, T. Ochiai, M. Delcroix, and T. Nakatani, "Improving Noise Robust Automatic Speech Recognition with Single-channel Time-domain Enhancement Network,” Poster
- K. Kinoshita, M. Delcroix, S. Araki, and T. Nakatani, "Tackling Real Noisy Reverberant Meetings with All-neural Source Separation, Counting, and Diarization System," Poster
- Y. Koizumi, K. Yatabe, M. Delcroix, Y. Masuyama, and D. Takeuchi, "Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention," Lecture
- Y. Koizumi, M. Yasuda, S. Murata, S. Saito, H. Uematsu, and N. Harada, "SPIDERnet: Attention Network for One-shot Anomaly Detection in Sounds," Poster
- T. Kondo, K. Fukushige, N. Takamune, D. Kitamura, H. Saruwatari, R. Ikeshita, and T. Nakatani, "Convergence-guaranteed Independent Positive Semidefinite Tensor Analysis based on Student's t Distribution", Poster
- S. Kurihara, M. Fukui, S. Shimauchi, and N. Harada, "Objective Quality Estimation Using PESQ for Hands-free Terminals," Poster
- R. Masumura, M. Ihori, A. Takashima, T. Moriya, A. Ando, and Y. Shinohara, "Sequence-level consistency training for semi-supervised end-to-end automatic speech recognition," Poster
- Y. Masuyama, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, "Phase reconstruction based on recurrent phase unwrapping with deep neural networks," Poster
- T. Moriya, H. Sato, T. Tanaka, T. Ashihara, R. Masumura, Y. Shinohara, "Distilling Attention Weights for CTC-based ASR Systems," Poster
- T. Nakatani, R. Takahashi, T. Ochiai, K. Kinoshita, R. Ikeshita, M. Delcroix, and S. Araki, "DNN-supported Mask-based Convolutional Beamforming for Simultaneous Denoising, Dereverberation, and Source Separation", Lecture
- H. Narimatsu and H. Kasai "Overlapped State Hidden Semi-Markov Model for Grouped Multiple Sequences," Lecture
- T. von Neumann, K. Kinoshita, L. Drude, C. Boeddeker, M. Delcroix, T. Nakatani, and R. Haeb-Umbach, "End-to-end Training of Time Domain Audio Separation and Recognition," Poster
- T. Ochiai, M. Delcroix, R. Ikeshita, K. Kinoshita, T. Nakatani, and S. Araki, "BEAM-TASNET: Time-domain Audio Separation Network Meets Frequency-domain Beamformer,"
- Y. Ohishi, A. Kimura, T. Kawanishi, K. Kashino, D. Harwath, and J. Glass, "Trilingual Semantic Embeddings of Visually Grounded Speech with Self-attention Mechanisms," Lecture.
- C. Schymura, T. Ochiai, M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, and D. Kolossa, “A Dynamic Stream Weight Backprop Kalman Filter for Audiovisual Speaker Tracking," Poster.
- D. Takeuchi, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, "Real-Time Speech Enhancement using Equilibraited RNN," Poster
- D. Takeuchi, K. Yatabe, Y. Koizumi, Y. Oikawa, and N. Harada, "Invertible DNN-based Nonlinear Time-Frequency Transform for Speech Enhancement," Poster
- N. Tawara, A. Ogawa, T. Iwata, M. Delcroix, and T. Ogawa, “Frame-level Phoneme-invariant Speaker Embedding for Text-independent Speaker Recognition on Extremely Short Utterances," Poster
- N. Tawara, H. Kamiyama, S. Kobashikawa, and A. Ogawa, “Improving Speaker-attribute Estimation by Voting based on Speaker Cluster Information,” Poster
- X. Wu, T. Kawanishi, and K. Kashino, "Reflectance-guided, Contrast-accumulated Histogram Equalization," Poster
- M. Yasuda, Y. Koizumi, S. Saito, H. Uematsu, and K. Imoto, "Sound Event Localization based on Sound Intensity Vector Refined by DNN-based Denoising and Source Separation," Poster
- G. Zhang, K. Niwa and W.B. Kleijn, "PROJECTED WEIGHT REGULARIZATION TO IMPROVE NEURAL NETWORK GENERALIZATION," Poster