Publications

T.Kawahara.
Automatic meeting transcription system for the Japanese Parliament (Diet).
In Proc. APSIPA ASC, (overview talk), 2017. (PDF file)
K.Matsuura, S.Ueno, M.Mimura, S.Sakai, and T.Kawahara.
Speech corpus of Ainu folklore and end-to-end speech recognition for Ainu language.
In Proc. Int'l Conf. Language Resources \& Evaluation (LREC), pp.2622--2628, 2020. (PDF file)
H.Futami, H.Inaguma, S.Ueno, M.Mimura, S.Sakai, and T.Kawahara.
Distilling the knowledge of BERT for sequence-to-sequence ASR.
In Proc. INTERSPEECH, pp.3635--3639, 2020. (PDF file)
S.Ueno, H.Inaguma, M.Mimura, and T.Kawahara.
Acoustic-to-word attention-based model complemented with character-level CTC-based model.
In Proc. IEEE-ICASSP, pp.5804--5808, 2018. (PDF file)

Y.Gao, C.Chu, and T.Kawahara.
Two-stage finetuning of wav2vec 2.0 for speech emotion recognition with ASR and gender pretraining.
In Proc. INTERSPEECH, pp.3635--3639, 2023. (PDF file)
H.Feng, S.Ueno, and T.Kawahara.
End-to-end speech emotion recognition combined with acoustic-to-word ASR model.
In Proc. INTERSPEECH, pp.501--505, 2020. (PDF file)

H.Shi, M.Mimura, L.Wang, J.Dang, and T.Kawahara.
Time-domain speech enhancement assisted by multi-resolution frequency encoder and decoder.
In Proc. IEEE-ICASSP, 2023. (PDF file)
K.Shimada, Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
Unsupervised speech enhancement based on multichannel NMF-informed beamforming for noise-robust automatic speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.27, No.5, pp.960--971, 2019. (text) (KURENAI)

K.Sekiguchi, Y.Bando, A.A.Nugraha, K.Yoshii, and T.Kawahara.
Fast multichannel nonnegative matrix factorization with directivity-aware jointly-diagonalizable spatial covariance matrices for blind source separation.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28, pp.2610--2625, 2020. (text)
Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
Statistical speech enhancement based on probabilistic integration of variational autoencoder and non-negative matrix factorization.
In Proc. IEEE-ICASSP, pp.716--720, 2018. (PDF file)

T.Zhao and T.Kawahara.
Joint dialog act segmentation and recognition in human conversations using attention to dialog context.
Computer Speech and Language, Vol.50, pp.108--127, 2019. (text)
T.V.Dang, T.Zhao, S.Ueno, H.Inaguma, and T.Kawahara.
End-to-end speech-to-dialog-act recognition.
In Proc. INTERSPEECH, pp.3910--3914, 2020. (PDF file)

T.Kawahara.
Spoken dialogue system for a human-like conversational robot ERICA.
In Proc. Int'l Workshop Spoken Dialogue Systems (IWSDS), (keynote speech), 2018. (PDF file)
K.Inoue, K.Hara, D.Lala, K.Yamamoto, S.Nakamura, K.Takanashi, and T.Kawahara.
Job interviewer android with elaborate follow-up question generation.
In Proc. ICMI, pp.324--332, 2020. (PDF file)
K.Inoue, D.Lala, K.Yamamoto, S.Nakamura, K.Takanashi, and T.Kawahara.
An attentive listening system with android ERICA: Comparison of autonomous and WOZ interactions.
In Proc. SIGdial Meeting Discourse \& Dialogue, pp.118--127, 2020. (PDF file)
T.Kawahara, N.Muramatsu, K.Yamamoto, D.Lala, and K.Inoue.
Semi-autonomous avatar enabling unconstrained parallel conversations --seamless hybrid of WOZ and autonomous dialogue systems--.
Advanced Robotics, Vol.35, No.11, pp.657--663, 2021. (text)

K.Yamamoto, K.Inoue, and T.Kawahara.
Character expression for spoken dialogue systems with semi-supervised learning using variational auto-encoder.
Computer Speech and Language, Vol.79, No. 101469, 2023. (text)
K.Inoue, D.Lala, and T.Kawahara.
Can a robot laugh with you?: Shared laughter generation for empathetic spoken dialogue.
Frontiers in Robotics and AI, Vol.Computational Intelligence in Robotics, pp.1--11, 9:933261, 2022. (text) (KURENAI)
T.Kawahara, T.Yamaguchi, K.Inoue, K.Takanashi, and N.Ward.
Prediction and generation of backchannel form for attentive listening systems.
In Proc. INTERSPEECH, pp.2890--2894, 2016. (PDF file)

K.Inoue, D.Lala, K.Takanashi, and T.Kawahara.
Engagement recognition by a latent character model based on multimodal listener behaviors in spoken dialogue.
APSIPA Trans. Signal \& Information Process., Vol.7, No.e9, pp.1--16, 2018. (text)
T.Kawahara, T.Iwatate, K.Inoue, S.Hayashi, H.Yoshimoto, and K.Takanashi.
Multi-modal sensing and analysis of poster conversations with smart posterboard.
APSIPA Trans. Signal \& Information Process., Vol.5, No.e2, pp.1--12, 2016. (text)

J.Nozaki, T.Kawahara, K.Ishizuka, and T.Hashimoto.
End-to-end speech-to-punctuated-text recognition.
In Proc. INTERSPEECH, pp.1811--1815, 2022. (PDF file)
M.Mimura, S.Sakai, and T.Kawahara.
An end-to-end model from speech to clean transcript for parliamentary meetings.
In Proc. APSIPA ASC, pp.465--470, 2021. (PDF file)

R.Duan, T.Kawahara, M.Dantsuji, and H.Nanjo.
Cross-lingual transfer learning of non-native acoustic modeling for pronunciation error detection and diagnosis.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28, No.1, pp.391--401, 2020. (text) (KURENAI)
M.Mirzaei, K.Meshgi, and T.Kawahara.
Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening.
Computer Speech and Language, Vol.49, pp.17--36, 2018. (text)

A.Lee and T.Kawahara.
Recent development of open-source speech recognition engine Julius.
In Proc. APSIPA ASC, pp.131--137, 2009. (PDF file)
T.Kawahara, A.Lee, K.Takeda, K.Itou, and K.Shikano.
Recent progress of open-source LVCSR engine Julius and Japanese model repository.
In Proc. ICSLP, pp.3069--3072, 2004. (PDF file)

Speech and Audio Processing Laboratory