Abstract
Intelligent Personal Assistant (IPA) devices such as Google Home and Amazon Echo have become commodity hardware and are well-known in the public domain. Leveraging these devices as speech-based interfaces to bespoke conversation agent (CA) systems in vocabulary-specific domains exposes their underlying Automatic Speech Recognition (ASR) transcription error rates, which are usually hidden behind a probability matching of utterance to intent (slot filling). We present an evaluation of the two aforementioned IPA’s isolated word and phrasal recognition rates together with an improvement scheme associated with a Contextual Multiple Classification Ripple Down Rules (C-MCRDR) CA knowledge-base system (KBS). When measuring isolated-word word error rates (WER) for a human speaker, Google Home achieved an average WER of 0.082 compared to 0.276 for Amazon Echo. Computer-generated utterances unsurprisingly had much poorer recognition rates, with WER for Google Home and Amazon Echo of 0.155 and 0.502 respectively. For phrasal tests, Google Home had an average WER of 0.066 in comparison to the Amazon Echo WER of 0.242 when processing human-sourced sentences. We applied a rule-based transcription error-correcting scheme for isolated words and achieved correct recognition rates of 100% for the Google Home in five of the isolated word data sets, and across all isolated words datasets we improved the initial average WER of 0.082 to 0.0153, a significant decrease of 81.34%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amazon: Alexa Skills Kit (2018). https://developer.amazon.com/alexa-skills-kit. Accessed 1 Feb 2019
Apple: SiriKit (2019). https://developer.apple.com/documentation/sirikit. Accessed 1 Feb 2019
Bassil, Y., Semaan, P.: ASR context-sensitive error correction based on Microsoft N-gram dataset. arXiv preprint arXiv:1203.5262 (2012)
Chen, W., Ananthakrishnan, S., Kumar, R., Prasad, R., Natarajan, P.: ASR error detection in a conversational spoken language translation system. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7418–7422, May 2013. https://doi.org/10.1109/ICASSP.2013.6639104
Compton, P.: Pacific knowledge systems - challenges with rules. Report, University of New South Wales. http://pks.com.au/wp-content/uploads/2015/03/WhitePaperChallengesWithRulesPKS.pdf
Compton, P., Jansen, R.: Knowledge in context: a strategy for expert system maintenance. In: AI 1988, pp. 292–306 (1990)
Dickens, C.: A Christmas Carol. Project Gutenberg (1843). https://www.gutenberg.org/ebooks/46. Accessed 15 Dec 2018
Dizon, G.: Using intelligent personal assistants for second language learning: a case study of Alexa. TESOL J. 8(4), 811–830 (2017)
Galgani, F., Compton, P., Hoffmann, A.: LEXA: building knowledge bases for automatic legal citation classification. Expert Syst. Appl. 42(17), 6391–6407 (2015). https://doi.org/10.1016/j.eswa.2015.04.022
Glina, E.M., Kang, B.H.: Conversation system with state information. J. Adv. Comput. Intell. 14(6), 741–745 (2010)
Google: Google Actions SDK (2019). https://developers.google.com/actions/. Accessed 1 Feb 2019
Han, S.C., Mirowski, L., Jeon, S.H., Lee, G.S., Kang, B.H., Turner, P.: Expert systems and home-based telehealth: exploring a role for MCRDR in enhancing diagnostics. In: International Conference, UCMA, SIA, CCSC, ACIT-2013, vol. 22, pp. 121–127 (2013)
Herbert, D., Kang, B.H.: Intelligent conversation system using multiple classification ripple down rules and conversational context. Expert Syst. Appl. 112, 342–352 (2018). https://doi.org/10.1016/j.eswa.2018.06.049
Horwitz, J.: Siri, Alexa, and Google Assistant can be controlled by inaudible commands. Venture Beat, May 2018. https://venturebeat.com/2018/05/10. Accessed 15 Dec 2019
Hoy, M.B.: Alexa, Siri, Cortana, and more: an introduction to voice assistants. Med. Ref. Serv. Q. 37(1), 81–88 (2018)
Jiang, J., et al.: Automatic online evaluation of intelligent assistants. In: Proceedings of the 24th International Conference on World Wide Web, pp. 506–516. International World Wide Web Conferences Steering Committee (2015)
Kang, B.H.: Validating knowledge acquisition: multiple classification ripple down rules. Ph.D. thesis, University of New South Wales Sydney (1995)
Kilgarriff, A.: BNC database and word frequency lists (2006). http://www.kilgarriff.co.uk/bnc-readme.html. Accessed 1 Feb 2019
Li, B., et al.: Acoustic modeling for Google Home. In: INTERSPEECH-2017, pp. 399–403 (2017)
Lopatovska, I., et al.: Talk to me: exploring user interactions with the Amazon Alexa. J. Libr. Inf. Sci. (2018). https://doi.org/10.1177/0961000618759414
Mak, P., Kang, B.H., Sammut, C., Kadous, W.: Knowledge acquisition module for conversation agent. School of Computing, University of Tasmania, Technical report (2004)
Mangu, L., Padmanabhan, M.: Error corrective mechanisms for speech recognition. In: Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (Cat. No.01CH37221), vol. 1, pp. 29–32 (2001). https://doi.org/10.1109/ICASSP.2001.940759
Manikonda, L., Deotale, A., Kambhampati, S.: What’s up with privacy? User preferences and privacy concerns in intelligent personal assistants. arXiv preprint arXiv:1711.07543 (2017)
Miranda-Mena, T.G., Ochoa, J.L., Martínez-Béjar, R., Fernández-Breis, J.T., Salinas, J.: A knowledge-based approach to assign breast cancer treatments in oncology units. Expert Syst. Appl. 31(3), 451–457 (2006). https://doi.org/10.1016/j.eswa.2005.09.076
Moore, A., Parada, P.P., Naylor, P.: Speech enhancement for robust automatic speech recognition: evaluation using a baseline system and instrumental measures. Comput. Speech Lang. 46, 574–584 (2017)
Natcorp: British National Corpus [BNC]. University of Oxford (2018). http://www.natcorp.ox.ac.uk. Accessed 15 Dec 2018
O’Shaughnessy, D.: Invited paper: automatic speech recognition: history, methods and challenges. Pattern Recognit. 41(10), 2965–2979 (2008). https://doi.org/10.1016/j.patcog.2008.05.008
Pellegrini, T., Trancoso, I.: Improving ASR error detection with non-decoder based features. In: Eleventh Annual Conference of the International Speech Communication Association, pp. 1950–1953 (2010)
Pham, K.C., Sammut, C.: RDRvision-learning vision recognition with ripple down rules. In: Proceedings of Australasian Conference on Robotics and Automation, p. 7 (2005)
Protalinski, E.: Google’s speech recognition technology now has a 4.9% word error rate. Venture Beat, May 2017. https://venturebeat.com/2017/05/17. Accessed 1 Feb 2019
Reis, A., Paulino, D., Paredes, H., Barroso, J.: Using intelligent personal assistants to strengthen the elderlies’ social bonds. In: Antona, M., Stephanidis, C. (eds.) UAHCI 2017. LNCS, vol. 10279, pp. 593–602. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58700-4_48
Richards, D.: Two decades of ripple down rules research. Knowl. Eng. Rev. 24(2), 159–184 (2009). https://doi.org/10.1017/S0269888909000241
Ringger, E.K., Allen, J.F.: Error correction via a post-processor for continuous speech recognition. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 1, pp. 427–430, May 1996. https://doi.org/10.1109/ICASSP.1996.541124
Sarma, A., Palmer, D.D.: Context-based speech recognition error detection and correction. In: Proceedings of HLT-NAACL 2004: Short Papers, pp. 85–88. Association for Computational Linguistics (2004)
Singer-Vine, J.: Markovify (2014). https://github.com/jsvine/markovify. Accessed 15 Dec 2018
Strayer, D.L., Cooper, J.M., Turrill, J., Coleman, J.R., Hopman, R.J.: The smartphone and the driver’s cognitive workload: a comparison of Apple, Google, and Microsoft’s intelligent personal assistants. Can. J. Exp. Psychol./Rev. Can. Psychol. expérimentale 71(2), 93 (2017)
Zhou, L., Shi, Y., Feng, J., Sears, A.: Data mining for detecting errors in dictation speech recognition. IEEE Trans. Speech Audio Process. 13(5), 681–688 (2005). https://doi.org/10.1109/TSA.2005.851874
Acknowledgments
This research has been supported by financial support via a grant from the Asian Office of Aerospace Research and Development (AOARD). The research is also supported by an Australian Government Research Training Program Scholarship, and it has University of Tasmania Ethics Approval, number H0016281.
Data cited herein has been extracted from the British National Corpus Online service, managed by Oxford University Computing Services on behalf of the BNC Consortium. All rights in the texts cited are reserved.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Herbert, D., Kang, B. (2019). Comparative Analysis of Intelligent Personal Agent Performance. In: Ohara, K., Bai, Q. (eds) Knowledge Management and Acquisition for Intelligent Systems. PKAW 2019. Lecture Notes in Computer Science(), vol 11669. Springer, Cham. https://doi.org/10.1007/978-3-030-30639-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-30639-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30638-0
Online ISBN: 978-3-030-30639-7
eBook Packages: Computer ScienceComputer Science (R0)