Publications
Automatic Speech Recognition (ASR)
- T.Kawahara.
Automatic meeting transcription system for the Japanese Parliament (Diet).
In Proc. APSIPA ASC, (overview talk), 2017.
(PDF file)
- K.Matsuura, S.Ueno, M.Mimura, S.Sakai, and T.Kawahara.
Speech corpus of Ainu folklore and end-to-end speech recognition
for Ainu language.
In Proc. Int'l Conf. Language Resources \& Evaluation (LREC),
pp.2622--2628, 2020.
(PDF file)
- H.Futami, H.Inaguma, S.Ueno, M.Mimura, S.Sakai, and T.Kawahara.
Distilling the knowledge of BERT for sequence-to-sequence ASR.
In Proc. INTERSPEECH, pp.3635--3639, 2020.
(PDF file)
- S.Ueno, H.Inaguma, M.Mimura, and T.Kawahara.
Acoustic-to-word attention-based model complemented with
character-level CTC-based model.
In Proc. IEEE-ICASSP, pp.5804--5808, 2018.
(PDF file)
Speech Emotion Recognition (SER)
- Y.Gao, C.Chu, and T.Kawahara.
Two-stage finetuning of wav2vec 2.0 for speech emotion recognition
with ASR and gender pretraining.
In Proc. INTERSPEECH, pp.3635--3639, 2023.
(PDF file)
- H.Feng, S.Ueno, and T.Kawahara.
End-to-end speech emotion recognition combined with acoustic-to-word
ASR model.
In Proc. INTERSPEECH, pp.501--505, 2020.
(PDF file)
Robust Speech Recognition
- H.Shi, M.Mimura, L.Wang, J.Dang, and T.Kawahara.
Time-domain speech enhancement assisted by multi-resolution frequency
encoder and decoder.
In Proc. IEEE-ICASSP, 2023.
(PDF file)
- K.Shimada, Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
Unsupervised speech enhancement based on multichannel NMF-informed
beamforming for noise-robust automatic speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.27, No.5, pp.960--971, 2019.
(text)
(KURENAI)
Source Separation and Speech Enhancement
- K.Sekiguchi, Y.Bando, A.A.Nugraha, K.Yoshii, and T.Kawahara.
Fast multichannel nonnegative matrix factorization with
directivity-aware jointly-diagonalizable spatial covariance matrices for
blind source separation.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28,
pp.2610--2625, 2020.
(text)
- Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
Statistical speech enhancement based on probabilistic integration of
variational autoencoder and non-negative matrix factorization.
In Proc. IEEE-ICASSP, pp.716--720, 2018.
(PDF file)
Spoken Language Understanding (SLU)
- T.Zhao and T.Kawahara.
Joint dialog act segmentation and recognition in human conversations
using attention to dialog context.
Computer Speech and Language, Vol.50, pp.108--127, 2019.
(text)
- T.V.Dang, T.Zhao, S.Ueno, H.Inaguma, and T.Kawahara.
End-to-end speech-to-dialog-act recognition.
In Proc. INTERSPEECH, pp.3910--3914, 2020.
(PDF file)
Spoken Dialogue Systems (SDS)
- T.Kawahara.
Spoken dialogue system for a human-like conversational robot ERICA.
In Proc. Int'l Workshop Spoken Dialogue Systems (IWSDS), (keynote speech), 2018.
(PDF file)
- K.Inoue, K.Hara, D.Lala, K.Yamamoto, S.Nakamura, K.Takanashi, and T.Kawahara.
Job interviewer android with elaborate follow-up question generation.
In Proc. ICMI, pp.324--332, 2020.
(PDF file)
- K.Inoue, D.Lala, K.Yamamoto, S.Nakamura, K.Takanashi, and T.Kawahara.
An attentive listening system with android ERICA: Comparison of
autonomous and WOZ interactions.
In Proc. SIGdial Meeting Discourse \& Dialogue, pp.118--127,
2020.
(PDF file)
- T.Kawahara, N.Muramatsu, K.Yamamoto, D.Lala, and K.Inoue.
Semi-autonomous avatar enabling unconstrained parallel conversations
--seamless hybrid of WOZ and autonomous dialogue systems--.
Advanced Robotics, Vol.35, No.11, pp.657--663, 2021.
(text)
Interaction Analysis and Model
- K.Yamamoto, K.Inoue, and T.Kawahara.
Character expression for spoken dialogue systems with semi-supervised learning using variational auto-encoder.
Computer Speech and Language, Vol.79, No. 101469, 2023.
(text)
- K.Inoue, D.Lala, and T.Kawahara.
Can a robot laugh with you?: Shared laughter generation for empathetic spoken dialogue.
Frontiers in Robotics and AI, Vol.Computational Intelligence in Robotics, pp.1--11, 9:933261, 2022.
(text)
(KURENAI)
- T.Kawahara, T.Yamaguchi, K.Inoue, K.Takanashi, and N.Ward.
Prediction and generation of backchannel form for attentive listening systems.
In Proc. INTERSPEECH, pp.2890--2894, 2016.
(PDF file)
Multi-modal Conversation Analysis
- K.Inoue, D.Lala, K.Takanashi, and T.Kawahara.
Engagement recognition by a latent character model based on
multimodal listener behaviors in spoken dialogue.
APSIPA Trans. Signal \& Information Process., Vol.7, No.e9,
pp.1--16, 2018.
(text)
- T.Kawahara, T.Iwatate, K.Inoue, S.Hayashi, H.Yoshimoto, and K.Takanashi.
Multi-modal sensing and analysis of poster conversations with smart posterboard.
APSIPA Trans. Signal \& Information Process., Vol.5, No.e2, pp.1--12, 2016.
(text)
Natural Language Processing for Rich Transcription
- J.Nozaki, T.Kawahara, K.Ishizuka, and T.Hashimoto.
End-to-end speech-to-punctuated-text recognition.
In Proc. INTERSPEECH, pp.1811--1815, 2022.
(PDF file)
- M.Mimura, S.Sakai, and T.Kawahara.
An end-to-end model from speech to clean transcript for parliamentary meetings.
In Proc. APSIPA ASC, pp.465--470, 2021.
(PDF file)
Computer Assisted Language Learning (CALL)
- R.Duan, T.Kawahara, M.Dantsuji, and H.Nanjo.
Cross-lingual transfer learning of non-native acoustic modeling for
pronunciation error detection and diagnosis.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28,
No.1, pp.391--401, 2020.
(text)
(KURENAI)
- M.Mirzaei, K.Meshgi, and T.Kawahara.
Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening.
Computer Speech and Language, Vol.49, pp.17--36, 2018.
(text)
Large Vocabulary Continuous Speech Recognition Platform
- A.Lee and T.Kawahara.
Recent development of open-source speech recognition engine Julius.
In Proc. APSIPA ASC, pp.131--137, 2009.
(PDF file)
- T.Kawahara, A.Lee, K.Takeda, K.Itou, and K.Shikano.
Recent progress of open-source LVCSR engine Julius and Japanese model repository.
In Proc. ICSLP, pp.3069--3072, 2004.
(PDF file)