Comparison of Speech Recognition Performance Between Kaldi and Google Cloud Speech API

Kimura, Takashi; Nose, Takashi; Hirooka, Shinji; Chiba, Yuya; Ito, Akinori

doi:10.1007/978-3-030-03748-2_13

Takashi Kimura⁷,
Takashi Nose⁷,
Shinji Hirooka^8,9,
Yuya Chiba⁷ &
…
Akinori Ito⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 110))

Included in the following conference series:

International Conference on Intelligent Information Hiding and Multimedia Signal Processing

962 Accesses
7 Citations

Abstract

In recent years, many systems having a speech interface have grown. The speech interface includes spoken dialogue function and high performance of a spoken dialogue system has been required. The spoken dialogue system consists of a speech recognition module. In this study, we focus on the speech recognition module of the spoken dialogue system and aim for improving the spoken dialogue system by enhancing the performance of the speech recognition system. Among several speech recognition systems, Kaldi is a widely used speech recognition system in many kinds of researches. On the other hand, several speech recognition services that are Web API is also provided, such as IBM Watson Speech to Text, Microsoft Bing Speech API, and Google Cloud Speech API, which is known that it has high performance. This paper compares speech recognition performance between Kaldi and Google Cloud Speech API in WER and RTF and confirms the recognition performance of each recognition system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

JEIDA Noise Database. http://research.nii.ac.jp/src/en/JEIDA-NOISE.html
The “nnet3” setup. http://kaldi-asr.org/doc/dnn3.html
Baumann, T., Kennington, C., Hough, J., Schlangen, D.: Recognising conversational speech: what an incremental asr should do for a dialogue system and how to get there. In: Dialogues with Social Robots: Enablements, Analyses, and Evaluation. pp. 421–432. Springer, Singapore (2017)
Google Scholar
Itou, K., Yamamoto, M., Takeda, K., Takezawa, T., Matsuoka, T., Kobayashi, T., Shikano, K., Itahashi, S.: JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. J. Acoust. Soc. Jpn. (E) 20(3), 199–206 (1999)
Article Google Scholar
Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying conditional random fields to Japanese morphological analysis. In: Proceedings of EMNLP, pp. 230–237 (2004)
Google Scholar
Maekawa, K., Hanae, K., Sadaoki, F., Isahara, H.: Spontaneous speech corpus of Japanese. In: Proceedings of the Second International Conference of Language Resources and Evaluation (LREC 2000), pp. 947–952 (2000)
Google Scholar
Morbini, F., Audhkhasi, K., Sagae, K., Artstein, R., Can, D., Georgiou, P.G., Narayanan, S., Leuski, A., Traum, D.R.: Which ASR should I choose for my dialogue system? In: Proceedings of SIGDIAL Conference, pp. 394–403 (2013)
Google Scholar
Mori, H., Satake, T., Nakamura, M., Kasuya, H.: Constructing a spoken dialogue corpus for studying paralinguistic information in expressive conversation and analyzing its statistical/acoustic characteristics. Speech Commun. 53(1), 36–50 (2011)
Article Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The Kaldi speech recognition toolkit. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 1–4 (2011)
Google Scholar
Takeishi, E., Nose, T., Chiba, Y., Ito, A.: Construction and analysis of phonetically and prosodically balanced emotional speech database. In: Proceedings of Oriental COCOSDA, pp. 16–21 (2016)
Google Scholar

Download references

Acknowledgment

Part of this work was supported by JSPS KAKENHI Grant Numbers JP17H00823.

Author information

Authors and Affiliations

Graduate School of Engineering, Tohoku University, 6-6-04, Aramaki Aza Aoba Aoba-ku, Sendai-shi, Miyagi, 980-8579, Japan
Takashi Kimura, Takashi Nose, Yuya Chiba & Akinori Ito
R&D Center, Hmcomm Co., Ltd., 2-11-1, Shibadaimon, Minato-ku, Tokyo, 105-0012, Japan
Shinji Hirooka
Faculty of Science, Chiba University, 1-33, Yayoi-cho, Inage-ku, Chiba-shi, Chiba, 263-8522, Japan
Shinji Hirooka

Authors

Takashi Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Nose
View author publications
You can also search for this author in PubMed Google Scholar
Shinji Hirooka
View author publications
You can also search for this author in PubMed Google Scholar
Yuya Chiba
View author publications
You can also search for this author in PubMed Google Scholar
Akinori Ito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akinori Ito .

Editor information

Editors and Affiliations

College of Information Science and Engineering, Fujian University of Technology, Fuzhou, Fujian, China
Jeng-Shyang Pan
Graduate School of Engineering, Tohoku University, Sendai, Miyagi, Japan
Akinori Ito
Swinburne University of Technology, Hawthorn, VIC, Australia
Pei-Wei Tsai
Centre for Artificial Intelligence, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kimura, T., Nose, T., Hirooka, S., Chiba, Y., Ito, A. (2019). Comparison of Speech Recognition Performance Between Kaldi and Google Cloud Speech API. In: Pan, JS., Ito, A., Tsai, PW., Jain, L. (eds) Recent Advances in Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2018. Smart Innovation, Systems and Technologies, vol 110. Springer, Cham. https://doi.org/10.1007/978-3-030-03748-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-03748-2_13
Published: 11 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03747-5
Online ISBN: 978-3-030-03748-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics