ABSTRACT
Voice-activated intelligent assistants, such as Siri, Google Now, and Cortana, are prevalent on mobile devices. However, it is challenging to evaluate them due to the varied and evolving number of tasks supported, e.g., voice command, web search, and chat. Since each task may have its own procedure and a unique form of correct answers, it is expensive to evaluate each task individually. This paper is the first attempt to solve this challenge. We develop consistent and automatic approaches that can evaluate different tasks in voice-activated intelligent assistants. We use implicit feedback from users to predict whether users are satisfied with the intelligent assistant as well as its components, i.e., speech recognition and intent classification. Using this approach, we can potentially evaluate and compare different tasks within and across intelligent assistants ac-cording to the predicted user satisfaction rates. Our approach is characterized by an automatic scheme of categorizing user-system interaction into task-independent dialog actions, e.g., the user is commanding, selecting, or confirming an action. We use the action sequence in a session to predict user satisfaction and the quality of speech recognition and intent classification. We also incorporate other features to further improve our approach, including features derived from previous work on web search satisfaction prediction, and those utilizing acoustic characteristics of voice requests. We evaluate our approach using data collected from a user study. Results show our approach can accurately identify satisfactory and unsatisfactory sessions.
- Ageev, M., Guo, Q., Lagun, D. and Agichtein, E. (2011). Find it if you can: a game for modeling different types of web search success using interaction data. Proc. SIGIR '11, 345--354. Google ScholarDigital Library
- Smith R.W. and Hipp, D.R. (1995). Spoken Natural Language Dialog Systems: A Practical Approach. Oxford University Press. Google ScholarDigital Library
- Feild, H.A., Allan, J. and Jones, R. (2010). Predicting searcher frustration. Proc. SIGIR '10, 34--41. Google ScholarDigital Library
- Fox, S., Karnawat, K., Mydland, M., Dumais, S. and White, T. 2005. Evaluating implicit measures to improve web search. ACM TOIS, 23(2), 147--168. Google ScholarDigital Library
- Friedman, J., Hastie, T., Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics.Google ScholarCross Ref
- Hassan, A. (2012). A semi-supervised approach to modeling web search satisfaction. Proc. SIGIR '12, 275--284. Google ScholarDigital Library
- Hassan, A., Jones, R. and Klinkner, K.L. (2010). Beyond DCG: user behavior as a predictor of a successful search. Proc. WSDM '10, 221--230. Google ScholarDigital Library
- Hassan, A., Shi, X., Craswell, N. and Ramsey, B. (2013). Beyond clicks: query reformulation as a predictor of search satisfaction. Proc. CIKM '13, 2019--2028. Google ScholarDigital Library
- Hassan, A., Song, Y. and He, L. (2011). A task level metric for measuring web search satisfaction and its application on improving relevance estimation. Proc. CIKM '11, 125--134. Google ScholarDigital Library
- Hassan, A., White, R.W., Dumais, S.T. and Wang, Y.M. (2014). Proc. WSDM '14, 53--62. Google ScholarDigital Library
- Heck, L.P., Hakkani-Tür, D., Chinthakunta, M., Tür, G., Iyer, R., Parthasarathy, P., Stifelman, L., Shriberg, E. and Fidler, A. (2013). Multi-modal conversational search and browse. Proceedings of the First Workshop on Speech, Language and Audio in Multimedia, 96--101.Google Scholar
- Huang, P.S., Kumar, K., Liu, C., Gong, Y. and Deng, L. (2013). Predicting speech recognition confidence using deep learning with word identity and score features. Proc. ICASSP, 7413--7417.Google ScholarCross Ref
- Huffman, S.B. and Hochster, M. (2007). How well does result relevance predict session satisfaction? Proc. SIGIR '07, 567--574. Google ScholarDigital Library
- Järvelin, K. and Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM TOIS, 20(4), 422--446. Google ScholarDigital Library
- Järvelin, K., Price, S., Delcambre, L.L. and Nielsen, M. (2008). Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions. Proc. ECIR '08, 4--15. Google ScholarDigital Library
- Jeng, W., Jiang, J. and He, D. (2013). Users' Perceived Difficulties and Corresponding Reformulation Strategies in Voice Search. Proc. HCIR 2013.Google Scholar
- Jiang, J., Hassan Awadallah, A., Shi, X. and White, R.W. (2015). Understanding and Predicting Graded Search Satisfaction. Proc. WSDM '15, 57--66. Google ScholarDigital Library
- Jiang, J., He, D. and Allan, J. (2014). Searching, browsing, and clicking in a search session. Proc. SIGIR '14, 607--616. Google ScholarDigital Library
- Jiang, J., Jeng, W. and He, D. (2013). How do users respond to voice input errors' lexical and phonetic query reformulation in voice search. Proc. SIGIR '13, 143--152. Google ScholarDigital Library
- Johnston, M., Bangalore, S., Vasireddy, G., Stent, A., Ehlen, P., Walker, M., Whittaker, S. and Maloor, P. (2002). MATCH: An architecture for multimodal dialogue systems. Proc. ACL '02, 376--383. Google ScholarDigital Library
- Kim, Y., Hassan, A., White, R.W. and Zitouni, I. (2014). Comparing client and server dwell time estimates for click-level satisfaction prediction. Proc. SIGIR '14, 895--898. Google ScholarDigital Library
- Kim, Y., Hassan, A., White, R.W. and Zitouni, I. (2014). Modeling dwell time to predict click-level satisfaction. Proc. WSDM '14, 193--202. Google ScholarDigital Library
- Kotov, A., Bennett, P.N., White, R.W., Dumais, S.T. and Teevan, J. (2011). Modeling and analysis of cross-session search tasks. Proc. SIGIR '11, 5--14. Google ScholarDigital Library
- Niu, X. and Kelly, D. (2014). The use of query suggestions during information search. IP&M, 50(1), 218--234. Google ScholarDigital Library
- Philips, L. (1990). Hanging on the Metaphone. Computer Language, 7(12), 39--44.Google Scholar
- Shokouhi, M., Jones, R., Ozertem, U., Raghunathan, K. and Diaz, F. 2014. Mobile query reformulations. Proc. SIGIR '14, 1011--1014. Google ScholarDigital Library
- Stolcke, A., Ries, K., Coccaro, N., Shriberg, E., Bates, R., Jurafsky, D., Taylor, P., Martin, R., Van Ess-Dykema, C. and Meteer, M. (2000). Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational linguistics, 26(3), 339--373. Google ScholarDigital Library
- Traum, D.R. (2000). 20 questions on dialogue act taxonomies. Journal of semantics, 17(1), 7--30.Google ScholarCross Ref
- Tur, G. and De Mori, R. (2011). Spoken language understanding: Systems for extracting semantic information from speech. John Wiley & Sons.Google Scholar
- Wahlster, W. (2006). SmartKom: foundations of multimodal dialogue systems. Springer. Google ScholarDigital Library
- Walker, M.A., Litman, D.J., Kamm, C.A. and Abella, A. (1997). PARADISE: A framework for evaluating spoken dialogue agents. Proc. ACL '97, 271--280. Google ScholarDigital Library
- Wang, H., Song, Y., Chang, M.W., He, X., Hassan, A. and White, R.W. (2014). Modeling action-level satisfaction for search task satisfaction prediction. SIGIR '14, 123--132. Google ScholarDigital Library
- Young, S., Gasic, M., Thomson, B. and Williams, J.D. (2013). POMDP-based statistical spoken dialog systems: A review. Proc. IEEE, 101(5), 1160--1179.Google ScholarCross Ref
Index Terms
- Automatic Online Evaluation of Intelligent Assistants
Recommendations
Predicting User Satisfaction with Intelligent Assistants
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalThere is a rapid growth in the use of voice-controlled intelligent personal assistants on mobile devices, such as Microsoft's Cortana, Google Now, and Apple's Siri. They significantly change the way users interact with search systems, not only because ...
Understanding User Satisfaction with Intelligent Assistants
CHIIR '16: Proceedings of the 2016 ACM on Conference on Human Information Interaction and RetrievalVoice-controlled intelligent personal assistants, such as Cortana, Google Now, Siri and Alexa, are increasingly becoming a part of users' daily lives, especially on mobile devices. They introduce a significant change in information access, not only by ...
Research on English pronunciation training based on intelligent speech recognition
When learning English, Chinese students tend to spend a lot of time in practicing reading and writing skills, while neglecting their ability to speak English. This study presented a speech recognition-based intelligent spoken English pronunciation ...
Comments