Publications

Automatic Speech Recognition (ASR)

  • T.Kawahara.
    Automatic meeting transcription system for the Japanese Parliament (Diet).
    In Proc. APSIPA ASC, (overview talk), 2017. (PDF file)
  • K.Matsuura, S.Ueno, M.Mimura, S.Sakai, and T.Kawahara.
    Speech corpus of Ainu folklore and end-to-end speech recognition for Ainu language.
    In Proc. Int'l Conf. Language Resources \& Evaluation (LREC), pp.2622--2628, 2020. (PDF file)
  • H.Futami, H.Inaguma, S.Ueno, M.Mimura, S.Sakai, and T.Kawahara.
    Distilling the knowledge of BERT for sequence-to-sequence ASR.
    In Proc. INTERSPEECH, pp.3635--3639, 2020. (PDF file)
  • S.Ueno, H.Inaguma, M.Mimura, and T.Kawahara.
    Acoustic-to-word attention-based model complemented with character-level CTC-based model.
    In Proc. IEEE-ICASSP, pp.5804--5808, 2018. (PDF file)

Speech Emotion Recognition (SER)

  • Y.Gao, C.Chu, and T.Kawahara.
    Two-stage finetuning of wav2vec 2.0 for speech emotion recognition with ASR and gender pretraining.
    In Proc. INTERSPEECH, pp.3635--3639, 2023. (PDF file)
  • H.Feng, S.Ueno, and T.Kawahara.
    End-to-end speech emotion recognition combined with acoustic-to-word ASR model.
    In Proc. INTERSPEECH, pp.501--505, 2020. (PDF file)

Robust Speech Recognition

  • H.Shi, M.Mimura, L.Wang, J.Dang, and T.Kawahara.
    Time-domain speech enhancement assisted by multi-resolution frequency encoder and decoder.
    In Proc. IEEE-ICASSP, 2023. (PDF file)
  • K.Shimada, Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
    Unsupervised speech enhancement based on multichannel NMF-informed beamforming for noise-robust automatic speech recognition.
    IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.27, No.5, pp.960--971, 2019. (text) (KURENAI)

Source Separation and Speech Enhancement

  • K.Sekiguchi, Y.Bando, A.A.Nugraha, K.Yoshii, and T.Kawahara.
    Fast multichannel nonnegative matrix factorization with directivity-aware jointly-diagonalizable spatial covariance matrices for blind source separation.
    IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28, pp.2610--2625, 2020. (text)
  • Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
    Statistical speech enhancement based on probabilistic integration of variational autoencoder and non-negative matrix factorization.
    In Proc. IEEE-ICASSP, pp.716--720, 2018. (PDF file)

Spoken Language Understanding (SLU)

  • T.Zhao and T.Kawahara.
    Joint dialog act segmentation and recognition in human conversations using attention to dialog context.
    Computer Speech and Language, Vol.50, pp.108--127, 2019. (text)
  • T.V.Dang, T.Zhao, S.Ueno, H.Inaguma, and T.Kawahara.
    End-to-end speech-to-dialog-act recognition.
    In Proc. INTERSPEECH, pp.3910--3914, 2020. (PDF file)

Spoken Dialogue Systems (SDS)

  • T.Kawahara.
    Spoken dialogue system for a human-like conversational robot ERICA.
    In Proc. Int'l Workshop Spoken Dialogue Systems (IWSDS), (keynote speech), 2018. (PDF file)
  • K.Inoue, K.Hara, D.Lala, K.Yamamoto, S.Nakamura, K.Takanashi, and T.Kawahara.
    Job interviewer android with elaborate follow-up question generation.
    In Proc. ICMI, pp.324--332, 2020. (PDF file)
  • K.Inoue, D.Lala, K.Yamamoto, S.Nakamura, K.Takanashi, and T.Kawahara.
    An attentive listening system with android ERICA: Comparison of autonomous and WOZ interactions.
    In Proc. SIGdial Meeting Discourse \& Dialogue, pp.118--127, 2020. (PDF file)
  • T.Kawahara, N.Muramatsu, K.Yamamoto, D.Lala, and K.Inoue.
    Semi-autonomous avatar enabling unconstrained parallel conversations --seamless hybrid of WOZ and autonomous dialogue systems--.
    Advanced Robotics, Vol.35, No.11, pp.657--663, 2021. (text)

Interaction Analysis and Model

  • K.Yamamoto, K.Inoue, and T.Kawahara.
    Character expression for spoken dialogue systems with semi-supervised learning using variational auto-encoder.
    Computer Speech and Language, Vol.79, No. 101469, 2023. (text)
  • K.Inoue, D.Lala, and T.Kawahara.
    Can a robot laugh with you?: Shared laughter generation for empathetic spoken dialogue.
    Frontiers in Robotics and AI, Vol.Computational Intelligence in Robotics, pp.1--11, 9:933261, 2022. (text) (KURENAI)
  • T.Kawahara, T.Yamaguchi, K.Inoue, K.Takanashi, and N.Ward.
    Prediction and generation of backchannel form for attentive listening systems.
    In Proc. INTERSPEECH, pp.2890--2894, 2016. (PDF file)

Multi-modal Conversation Analysis

  • K.Inoue, D.Lala, K.Takanashi, and T.Kawahara.
    Engagement recognition by a latent character model based on multimodal listener behaviors in spoken dialogue.
    APSIPA Trans. Signal \& Information Process., Vol.7, No.e9, pp.1--16, 2018. (text)
  • T.Kawahara, T.Iwatate, K.Inoue, S.Hayashi, H.Yoshimoto, and K.Takanashi.
    Multi-modal sensing and analysis of poster conversations with smart posterboard.
    APSIPA Trans. Signal \& Information Process., Vol.5, No.e2, pp.1--12, 2016. (text)

Natural Language Processing for Rich Transcription

  • J.Nozaki, T.Kawahara, K.Ishizuka, and T.Hashimoto.
    End-to-end speech-to-punctuated-text recognition.
    In Proc. INTERSPEECH, pp.1811--1815, 2022. (PDF file)
  • M.Mimura, S.Sakai, and T.Kawahara.
    An end-to-end model from speech to clean transcript for parliamentary meetings.
    In Proc. APSIPA ASC, pp.465--470, 2021. (PDF file)

Computer Assisted Language Learning (CALL)

  • R.Duan, T.Kawahara, M.Dantsuji, and H.Nanjo.
    Cross-lingual transfer learning of non-native acoustic modeling for pronunciation error detection and diagnosis.
    IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28, No.1, pp.391--401, 2020. (text) (KURENAI)
  • M.Mirzaei, K.Meshgi, and T.Kawahara.
    Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening.
    Computer Speech and Language, Vol.49, pp.17--36, 2018. (text)

Large Vocabulary Continuous Speech Recognition Platform

  • A.Lee and T.Kawahara.
    Recent development of open-source speech recognition engine Julius.
    In Proc. APSIPA ASC, pp.131--137, 2009. (PDF file)
  • T.Kawahara, A.Lee, K.Takeda, K.Itou, and K.Shikano.
    Recent progress of open-source LVCSR engine Julius and Japanese model repository.
    In Proc. ICSLP, pp.3069--3072, 2004. (PDF file)