ABSTRACT
Task-oriented chatbots allow users to carry out tasks (e.g., ordering a pizza) using natural language conversation. The widely-used slot-filling approach for building bots of this type requires significant hand-coding, which hinders scalability. Recently, neural network models have been shown to be capable of generating natural "chitchat" conversations, but it is unclear whether they will ever work for task modeling. Kite is a practical system for bootstrapping task-oriented bots, leveraging both approaches above. Kite's key insight is that while bots encapsulate the logic of user tasks into conversational forms, existing apps encapsulate the logic of user tasks into graphical user interfaces. A developer demonstrates a task using a relevant app, and from the collected interaction traces Kite automatically derives a task model, a graph of actions and associated inputs representing possible task execution paths. A task model represents the logical backbone of a bot, on which Kite layers a question-answer interface generated using a hybrid rule-based and neural network approach. Using Kite, developers can automatically generate bot templates for many different tasks. In our evaluation, it extracted accurate task models from 25 popular Android apps spanning 15 tasks. Appropriate questions and high-quality answers were also generated. Our developer study suggests that developers, even without any bot developing experience, can successfully generate bot templates using Kite.
Supplemental Material
- 2017. BotsCrew. http://botscrew.com/.Google Scholar
- MartÃŋn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-scale Machine Learning. In Proc. of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, 265--283. Google ScholarDigital Library
- Khalid Alharbi and Tom Yeh. 2015. Collect, Decompile, Extract, Stats, and Diff: Mining Design Pattern Changes in Android Apps. In Proc. of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI '15). ACM, 515--524. Google ScholarDigital Library
- Husam Ali, Yllias Chali, and Sadid A Hasan. 2010. Automation of question generation from sentences. In Proc. of the Third Workshop on Question Generation, QG2010.Google Scholar
- Amazon Alexa. 2017. Alexa Skills Kit. https://developer.amazon.com/alexa-skills-kit.Google Scholar
- Amazon Alexa - Voice Design Guide. 2017. What Users Say - Making sure Alexa understands what people are saying. https://developer.amazon.com/designing-for-voice/what-users-say/.Google Scholar
- Amazon AWS. 2017. Amazon Lex. https://aws.amazon.com/lex/.Google Scholar
- Android. 2017. Support Library. https://developer.android.com/topic/libraries/support-library/index.html.Google Scholar
- Appsee. 2017. https://www.appsee.com/.Google Scholar
- Bahadir Ismail Aydin, Yavuz Selim Yilmaz, Yaliang Li, Qi Li, Jing Gao, and Murat Demirbas. 2014. Crowdsourcing for Multiple-choice Question Answering. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI'14). AAAI Press, 2946--2953. http://dl.acm.org/citation.cfm?id=2892753.2892959 Google ScholarDigital Library
- Rafael E. Banchs. 2012. Movie-DiC: A Movie Dialogue Corpus for Research and Development. In Proc. of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2 (ACL '12). Association for Computational Linguistics, 203--207. Google ScholarDigital Library
- Daniel G. Bobrow, Ronald M. Kaplan, Martin Kay, Donald A. Norman, Henry Thompson, and Terry Winograd. 1977. GUS, a Frame-driven Dialog System. Artif. Intell. 8, 2 (April 1977), 155--173. Google ScholarDigital Library
- Antoine Bordes and Jason Weston. 2016. Learning End-to-End Goal-Oriented Dialog. CoRR abs/1605.07683 (2016). http://arxiv.org/abs/1605.07683Google Scholar
- Business Insider. 2017. Amazon's Alexa has gained 14,000 skills in the last year. http://www.businessinsider.com/amazon-alexa-how-many-skills-chart-2017-7.Google Scholar
- Pei-Yu (Peggy) Chi, Sen-Po Hu, and Yang Li. 2018. Doppio: Tracking UI Flows and Code Changes for App Development. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 455, 13 pages. Google ScholarDigital Library
- Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv.1412.3555 {cs} (Dec. 2014). http://arxiv.org/abs/1412.3555 arXiv: 1412.3555.Google Scholar
- Kenneth Mark Colby, Sylvia Weber, and Franklin Dennis Hilf. 1971. Artificial Paranoia. Vol. 2. 1 - 25 pages. Google ScholarDigital Library
- Allen Cypher and Daniel Conrad Halbert. 1993. Watch what I do: programming by demonstration. MIT press. Google ScholarDigital Library
- Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In Proc. of the 30th Annual ACM Symposium on User Interface Software and Technology (UIST '17). ACM, 845--854. Google ScholarDigital Library
- Biplab Deka, Zifeng Huang, Chad Franzen, Jeffrey Nichols, Yang Li, and Ranjitha Kumar. 2017. ZIPT: Zero-Integration Performance Testing of Mobile App Designs. In Proc. of the 30th Annual ACM Symposium on User Interface Software and Technology (UIST '17). ACM, 727--736. Google ScholarDigital Library
- Biplab Deka, Zifeng Huang, and Ranjitha Kumar. 2016. ERICA:Interaction Mining Mobile Apps. In Proc. of the 29th Annual Symposium on User Interface Software and Technology (UIST 16). ACM, 767--776. Google ScholarDigital Library
- Dialogflow. 2017. Build natural and rich conversational experiences. https://dialogflow.com/.Google Scholar
- Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexander H. Miller, Arthur Szlam, and Jason Weston. 2016. Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems. In Proc. of ICLR. http://arxiv.org/abs/1511.06931Google Scholar
- Xinya Du, Junru Shao, and Claire Cardie. 2017. Learning to Ask: Neural Question Generation for Reading Comprehension. In Proc. of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1342--1352. http://aclweb.org/anthology/P17-1123Google ScholarCross Ref
- Facebook. 2017. Wit.ai. https://wit.ai/.Google Scholar
- Earlence Fernandes, Oriana Riva, and Suman Nath. 2016. Appstract: On-thefly App Content Semantics with Better Privacy. In Proc. of the 22nd Annual International Conference on Mobile Computing and Networking (MobiCom '16). 361--374. Google ScholarDigital Library
- Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In Proc. of the 43rd Annual Meeting on Association for Computational Linguistics (ACL '05). Association for Computational Linguistics, 363--370. Google ScholarDigital Library
- Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin 76, 5 (1971), 378.Google Scholar
- Flurry. 2017. Flurry Analytics. https://y.flurry.com/.Google Scholar
- Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen-tau Yih, and Michel Galley. 2017. A Knowledge-Grounded Neural Conversation Model. CoRR abs/1702.01932 (2017). http://arxiv.org/abs/1702.01932Google Scholar
- GitHub. 2017. alexa-js/alexa-utterances. https://github.com/alexa-js/alexa-utterances.Google Scholar
- GitHub. 2017. Code example: miguelmota/intent-utterance-generator. https://lab.miguelmota.com/intent-utterance-expander/example/.Google Scholar
- GitHub. 2017. Code: miguelmota/intent-utterance-generator. https://github.com/miguelmota/intent-utterance-generator.Google Scholar
- Gizmodo. 2017. Facebook Chatbots Are Frustrating and Useless. https://gizmodo.com/facebook-messenger-chatbots-are-more-frustrating-than-h-1770732045.Google Scholar
- Gizmodo. 2017. The Amazon Echo Now Has 10,000 Mostly Useless 'Skills'. https://gizmodo.com/the-amazon-echo-now-has-10-000-mostly-useless-skills-179269536.Google Scholar
- Google. 2017. Actions on Google. https://developers.google.com/actions/.Google Scholar
- Google. 2017. Google Analytics. https://analytics.google.com/.Google Scholar
- Seungyeop Han, Matthai Philipose, and Yun-Cheng Ju. 2013. NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing. In Proc. of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp '13). ACM, 429--438. Google ScholarDigital Library
- Shuai Hao, Bin Liu, Suman Nath, William G.J. Halfond, and Ramesh Govindan. 2014. PUMA: Programmable UI-automation for Large-scale Dynamic Analysis of Mobile Apps. In Proc. of the 12th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys '14). ACM, 204--217. Google ScholarDigital Library
- Sepp Hochreiter and JÃijrgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (Nov. 1997), 1735--1780. Google ScholarDigital Library
- IBM. 2017. Watson. https://www.ibm.com/watson/developer/.Google Scholar
- Daniel Jurafsky and James H. Martin. 2017. Speech and Language Processing (3rd ed.). Chapter 29. Draft available online at https://web.stanford.edu/~jurafsky/slp3/29.pdf.Google Scholar
- Saidalavi Kalady, Ajeesh Elikkottil, and Rajarshi Das. 2010. Natural language question generation using syntax and keywords. In Proc. of the Third Workshop on Question Generation, QG2010.Google Scholar
- Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent Continuous Translation Models.. In EMNLP. ACL, 1700--1709.Google Scholar
- Tejaswi Kasturi, Haojian Jin, Aasish Pappu, Sungjin Lee, Beverley Harrison, Ramana Murthy, and Amanda Stent. 2015. The Cohort and Speechify Libraries for Rapid Construction of Speech Enabled Applications for Android. In Proc. of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 441--443. http://aclweb.org/anthology/W15-4661Google ScholarCross Ref
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL '07). Association for Computational Linguistics, Stroudsburg, PA, USA, 177--180. http://dl.acm.org/citation.cfm?id=1557769.1557821 Google ScholarDigital Library
- Igor Labutov, Sumit Basu, and Lucy Vanderwende. 2015. Deep Questions without Deep Understanding. In Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015.Google ScholarCross Ref
- J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data. biometrics (1977), 159--174.Google Scholar
- Gierad P. Laput, Mira Dontcheva, Gregg Wilensky, Walter Chang, Aseem Agarwala, Jason Linder, and Eytan Adar. 2013. PixelTone: A Multimodal Interface for Image Editing. In Proc. of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). ACM, 2185--2194. Google ScholarDigital Library
- Tessa Lau, Julian Cerruti, Guillermo Manzato, Mateo Bengualid, Jeffrey P. Bigham, and Jeffrey Nichols. 2010. A Conversational Interface to Web Automation. In Proc. of the 23Nd Annual ACM Symposium on User Interface Software and Technology (UIST '10). ACM, 229--238. Google ScholarDigital Library
- Oliver Lemon, Kallirroi Georgila, James Henderson, and Matthew Stuttle. 2006. An ISU Dialogue System Exhibiting Reinforcement Learning of Dialogue Policies: Generic Slot-filling in the TALK In-car System. In Proc. of the 11th Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations (EACL '06). Association for Computational Linguistics, 119--122. Google ScholarDigital Library
- Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. 2016. A Persona-Based Neural Conversation Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 994--1003.Google ScholarCross Ref
- Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky. 2016. Deep Reinforcement Learning for Dialogue Generation. In In Proc. of EMNLP.Google Scholar
- Jiwei Li, Will Monroe, Tianlin Shi, Alan Ritter, and Dan Jurafsky. 2017. Adversarial Learning for Neural Dialogue Generation. In In Proc. of EMNLP.Google Scholar
- Toby Jia-Jun Li, Amos Azaria, and Brad A. Myers. 2017. SUGILITE: Creating Multimodal Smartphone Automation by Demonstration. In Proc. of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, 6038--6049. Google ScholarDigital Library
- Toby Jia-Jun Li, Igor Labutov, Brad A. Myers, Amos Azaria, Alexander I. Rudnicky, and Tom M. Mitchell. 2018. An End User Development Approach for Failure Handling in Goal-oriented Conversational Agents. In Studies in Conversational UX Design. Springer.Google Scholar
- Toby Jia-Jun Li, Yuanchun Li, Fanglin Chen, and Brad A. Myers. 2017. Programming IoT Devices by Demonstration Using Mobile Apps. In End-User Development, Simone Barbosa, Panos Markopoulos, Fabio Paternò, Simone Stumpf, and Stefano Valtolina (Eds.). Springer International Publishing, Cham, 3--17.Google Scholar
- Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2017. DroidBot: a lightweight UI-guided test input generator for Android. In Proc. of the 39th International Conference on Software Engineering Companion. IEEE Press, 23--26. Google ScholarDigital Library
- Localytics. 2017. http://www.localytics.com/.Google Scholar
- Microsoft. 2017. Bot framework. https://dev.botframework.com/.Google Scholar
- Microsoft - Cognitive Services. 2017. Language Understanding Intelligent Service. https://www.luis.ai.Google Scholar
- Ruslan Mitkov and Le An Ha. 2003. Computer-aided Generation of Multiplechoice Tests. In Proc. of the HLT-NAACL 03 Workshop on Building Educational Applications Using Natural Language Processing - Volume 2 (HLT-NAACL-EDUC '03). Association for Computational Linguistics, 17--22. Google ScholarDigital Library
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proc. of the 40th Annual Meeting on Association for Computational Linguistics (ACL '02). Association for Computational Linguistics, Stroudsburg, PA, USA, 311--318. Google ScholarDigital Library
- Ranker. 2017. Coffee Shop Chains That Make Mornings Bearable. https://www.ranker.com/list/best-coffee-shop-chains/chef-jen.Google Scholar
- Alan Ritter, Colin Cherry, and William B. Dolan. 2011. Data-driven response generation in social media. In Proc. of EMNLP. Association for Computational Linguistics, 583--593. Google ScholarDigital Library
- Search Engine Roundtable. 2017. Google: Chatbots Don't Make Your Pages Better. https://www.seroundtable.com/google-on-chatbots-seo-24494.html.Google Scholar
- Vasile Rus, Brendan Wyse, Paul Piwek, Mihai Lintean, Svetlana Stoyanchev, and Cristian Moldovan. 2010. The First Question Generation Shared Task Evaluation Challenge. In Proc. of the 6th International Natural Language Generation Conference (INLG '10). Association for Computational Linguistics, 251--257. http://dl.acm.org/citation.cfm?id=1873738.1873777 Google ScholarDigital Library
- Alireza Sahami Shirazi, Niels Henze, Albrecht Schmidt, Robin Goldberg, Benjamin Schmidt, and Hansjörg Schmauder. 2013. Insights into Layout Patterns of Mobile User Interfaces by an Automatic Analysis of Android Apps. In Proc. of the 5th ACM SIGCHI Symposium on Engineering Interactive Computing Systems (EICS 13). ACM, 275--284. Google ScholarDigital Library
- Iulian Vlad Serban, Alberto García-Durán, Çaglar Gülçehre, Sungjin Ahn, Sarath Chandar, Aaron C. Courville, and Yoshua Bengio. 2016. Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus. In Proc. of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016. http://aclweb.org/anthology/P/P16/P16-1056.pdfGoogle ScholarCross Ref
- Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016. Building End-to-end Dialogue Systems Using Generative Hierarchical Neural Network Models. In Proc. of the 30th AAAI Conference on Artificial Intelligence (AAAI'16). AAAI Press, 3776--3783. Google ScholarDigital Library
- Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neural Responding Machine for Short-Text Conversation. In Proc. of ACL. 1577--1586.Google ScholarCross Ref
- Northwoods Software. 2017. GoJS Diagrams for JavaScript and HTML. https://gojs.net/Google Scholar
- Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan. 2015. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses. In Proc. of NAACL-HLT. 196--205.Google ScholarCross Ref
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Proc. of NIPS. Montreal, CA. http://arxiv.org/abs/1409.3215 Google ScholarDigital Library
- TechCrunch. 201. Kik users have exchanged over 1.8 billion messages with the platform's 20,000 chatbots. https://techcrunch.com/2016/08/03/kik-users-have-exchanged-over-1-8-billion-messages-with-the-platforms-20000-chatbots.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arXiv:1706.03762 {cs} (June 2017). http://arxiv.org/abs/1706.03762 arXiv: 1706.03762.Google Scholar
- Venture Beat. 2017. One year later, here's the state of the chatbot economy. https://venturebeat.com/2017/06/11/one-year-later-heres-the-state-of-the-chatbot-economy.Google Scholar
- Oriol Vinyals and Quoc V. Le. 2015. A Neural Conversational Model. In Proc. of ICML Deep Learning Workshop.Google Scholar
- Hao Wang, Zhengdong Lu, Hang Li, and Enhong Chen. 2013. A Dataset for Research on Short-Text Conversations. In Proc. of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 935--945.Google Scholar
- Zhuoran Wang and Oliver Lemon. 2013. A Simple and Generic Belief Tracking Mechanism for the Dialog State Tracking Challenge: On the believability of observed information. In Proc. of the SIGDIAL 2013 Conference. Association for Computational Linguistics, 423--432.Google Scholar
- Wayne Ward and Sunil Issar. 1994. Recent Improvements in the CMU Spoken Language Understanding System. In Proc. of the Workshop on Human Language Technology (HLT '94). Association for Computational Linguistics, 213--216. Google ScholarDigital Library
- Joseph Weizenbaum. 1966. ELIZA---a Computer Program for the Study of Natural Language Communication Between Man and Machine. Commun. ACM 9, 1 (Jan. 1966), 36--45. Google ScholarDigital Library
- Johannes Welbl, Nelson F. Liu, and Matt Gardner. 2017. Crowdsourcing Multiple Choice Science Questions. CoRR abs/1707.06209 (2017). http://arxiv.org/abs/1707.06209Google Scholar
- Tsung-Hsien Wen, Milica Gašić, Dongho Kim, Nikola Mrkšić, Pei-Hao Su, David Vandyke, and Steve Young. 2015. Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking. In Proc. of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). Association for Computational Linguistics.Google ScholarCross Ref
- Jason D. Williams, Kavosh Asadi, and Geoffrey Zweig. 2017. Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning. In Proc. of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017).Google ScholarCross Ref
- Xingdi Yuan, Tong Wang, Çaglar Gülçehre, Alessandro Sordoni, Philip Bachman, Sandeep Subramanian, Saizheng Zhang, and Adam Trischler. 2017. Machine Comprehension by Text-to-Text Neural Question Generation. CoRR abs/1705.02012 (2017). http://arxiv.org/abs/1705.02012Google Scholar
- Qingyu Zhou, Nan Yang, Furu Wei, Chuanqi Tan, Hangbo Bao, and Ming Zhou. 2017. Neural Question Generation from Text: A Preliminary Study. CoRR abs/1704.01792 (2017). http://arxiv.org/abs/1704.01792Google Scholar
Index Terms
- Kite: Building Conversational Bots from Mobile Apps
Recommendations
Detecting repackaged smartphone applications in third-party android marketplaces
CODASPY '12: Proceedings of the second ACM conference on Data and Application Security and PrivacyRecent years have witnessed incredible popularity and adoption of smartphones and mobile devices, which is accompanied by large amount and wide variety of feature-rich smartphone applications. These smartphone applications (or apps), typically organized ...
An Explorative Study of the Mobile App Ecosystem from App Developers' Perspective
WWW '17: Proceedings of the 26th International Conference on World Wide WebWith the prevalence of smartphones, app markets such as Apple App Store and Google Play has become the center stage in the mobile app ecosystem, with millions of apps developed by tens of thousands of app developers in each major market. This paper ...
A Measurement-based Study on Application Popularity in Android and iOS App Stores
Mobidata '15: Proceedings of the 2015 Workshop on Mobile Big DataMobile application stores (appstores) are emerging digital distribution platforms with explosive growth. Although there have been some observations on the mobile application (app) popularity in Android appstores, there is no report on the app popularity ...
Comments