Skip to main content

UQ-AAS21: A Comprehensive Dataset of Amazon Alexa Skills

  • Conference paper
  • First Online:
Advanced Data Mining and Applications (ADMA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13087))

Included in the following conference series:

Abstract

Various virtual personal assistant (VPA) services have become popular, due to the convenient interaction manner of voice user interface (VUI) they offer. Centered around them, an ecosystem involving service providers, third-party developers and end users, has started being formulated. The developers are enabled to create applications and release them through application stores, from which the users can obtain them and then run them on smart devices. This emerging ecosystem is still in its early stage, and a great deal of research effort is desired to make it on the healthy track to facilitate its development. Nonetheless, there is still a lack of comprehensive datasets for our research community to conduct research on relevant issues, e.g., the bug-freeness and quality of the applications, and users’ security and privacy concerns on them. In this work, we aim to build such a dataset for research use. We target the Amazon VPA service, i.e., the Alexa, which is the most popular VPA service. We collect 65,195 Alexa applications (or skills), and extract comprehensive information about them, including invocation names, user reviews, among overall 16 attributes. We show the demographic details of the skills and their developers, and also conduct preliminary statistical analyses on the quality and privacy issues, to demonstrate the potential usage of our dataset. The dataset and analysis results are released online to facilitate future research: https://github.com/xie00059/Amazon-Alexa-UQ-AAS21-datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The popularity of a skill is measured based on the number of ratings it receives.

References

  1. Amazon Alexa voice AI, Alexa developer official site (2021). https://developer.amazon.com/en-US/alexa

  2. Beautiful soup documentation - beautiful soup 4.9.0 documentation (2021). https://www.crummy.com/software/BeautifulSoup/bs4/doc/

  3. Create alexa skills kit, custom voice model skills (2021). https://developer.amazon.com/en-US/docs/alexa/custom-skills/understanding-how-users-invoke-custom-skills.html

  4. Facts and statistics about virtual assistants (2021). https://www.statista.com/topics/5572/virtual-assistants/

  5. General data protection regulation (2021). https://gdpr-info.eu

  6. Google play store (2021). https://play.google.com/store?hl=en&gl=US

  7. Requests: Http for humans\(^{\rm TM}\) - requests 2.26.0 documentation (2021). https://docs.python-requests.org/en/master/

  8. Total number of amazon Alexa skills from January 2016 to September 2019 (2021). https://www.statista.com/statistics/912856/amazon-alexa-skills-growth/

  9. Alhadlaq, A., Tang, J., Almaymoni, M., Korolova, A.: Privacy in the amazon Alexa skills ecosystem. Star 217(11) (1902)

    Google Scholar 

  10. Ammari, T., Kaye, J., Tsai, J.Y., Bentley, F.: Music, search, and IoT: how people (really) use voice assistants. ACM Trans. Comput. Hum. Interact. 26(3), 1–17 (2019)

    Article  Google Scholar 

  11. Bhargava, M., Safari, A.O.M.C.: Alexa skills projects : build exciting projects with Amazon Alexa and integrate it with Internet of Things. Birmingham, England; Mumbai : Packt, 1st edition edn. (2018). https://ebookcentral.proquest.com/lib/canterbury/detail.action?docID=5446037, electronic reproduction. Boston, MA : Safari. Available via World Wide Web., 2018

  12. Cho, E., Sundar, S.S., Abdullah, S., Motalebi, N.: Will deleting history make Alexa more trustworthy? effects of privacy and content customization on user experience of smart speakers. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–13 (2020)

    Google Scholar 

  13. Chung, H., Lee, S.: Intelligent virtual assistant knows your life. arXiv preprint arXiv:1803.00466 (2018)

  14. Gao, Y., Pan, Z., Wang, H., Chen, G.: Alexa, my love: analyzing reviews of amazon echo. In: 2018 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computing, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 372–380 (2018). https://doi.org/10.1109/SmartWorld.2018.00094

  15. Guo, Z., Lin, Z., Li, P., Chen, K.: Skillexplorer: understanding the behavior of skills in large scale. In: 29th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 20), pp. 2649–2666 (2020)

    Google Scholar 

  16. Kinsella, B.: Google assistant actions total 4,253 in January 2019, up 2.5 x in past year but 7.5% the total number Alexa skills in us. Voicebot. AI, February 15 (2019)

    Google Scholar 

  17. Kumar, D., et al.: Skill squatting attacks on amazon Alexa. In: 27th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 18), pp. 33–47 (2018)

    Google Scholar 

  18. Lentzsch, C., Shah, S.J., Andow, B., Degeling, M., Das, A., Enck, W.: Hey Alexa, is this skill safe?: Taking a closer look at the Alexa skill ecosystem. In: 28th Annual Network and Distributed System Security Symposium (NDSS 2021), The Internet Society (2021)

    Google Scholar 

  19. Lit, Y., Kim, S., Sy, E.: A survey on amazon Alexa attack surfaces. In: 2021 IEEE 18th Annual Consumer Communications Networking Conference (CCNC), pp. 1–7 (2021). https://doi.org/10.1109/CCNC49032.2021.9369553

  20. Lopatovska, I., et al.: Talk to me: exploring user interactions with the amazon Alexa. J. Librarianship Inf. Sci. 51(4), 984–997 (2019)

    Article  Google Scholar 

  21. Mahadewa, K., et al.: Identifying privacy weaknesses from multi-party trigger-action integration platforms. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 2–15 (2021)

    Google Scholar 

  22. Malkin, N., Deatrick, J., Tong, A., Wijesekera, P., Egelman, S., Wagner, D.: Privacy attitudes of smart speaker users. Proc. Priv. Enhancing Technol. 2019(4) (2019)

    Google Scholar 

  23. Schönherr, L., Kohls, K., Zeiler, S., Holz, T., Kolossa, D.: Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. arXiv preprint arXiv:1808.05665 (2018)

  24. Sciuto, A., Saini, A., Forlizzi, J., Hong, J.I.: "Hey Alexa, what’s up?" a mixed-methods studies of in-home conversational agent usage. In: Proceedings of the 2018 Designing Interactive Systems Conference, pp. 857–868 (2018)

    Google Scholar 

  25. Tur, G., Deoras, A., Hakkani-Tür, D.: Detecting out-of-domain utterances addressed to a virtual personal assistant. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)

    Google Scholar 

  26. Wang, K., Zhang, J., Bai, G., Ko, R., Dong, J.S.: It’s not just the site, it’s the contents: intra-domain fingerprinting social media websites through CDN bursts. In: Proceedings of the Web Conference 2021, pp. 2142–2153 (2021)

    Google Scholar 

  27. Yuan, X., et al.: All your Alexa are belong to us: a remote voice control attack against echo. In: 2018 IEEE Global Communications Conference (GLOBECOM), pp. 1–6 (2018). https://doi.org/10.1109/GLOCOM.2018.8647762

  28. Zarate, J.M., Tian, X., Woods, K.J., Poeppel, D.: Multiple levels of linguistic and paralinguistic features contribute to voice recognition. Sci. Rep. 5(1), 1–9 (2015)

    Google Scholar 

  29. Zhang, N., Mi, X., Feng, X., Wang, X., Tian, Y., Qian, F.: Dangerous skills: understanding and mitigating security risks of voice-controlled third-party functions on virtual personal assistant systems. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 1381–1396. IEEE (2019)

    Google Scholar 

  30. Zhang, Yanjun, Bai, Guangdong, Li, Xue, Curtis, Caitlin, Chen, Chen, Ko, Ryan K. L..: PrivColl: practical privacy-preserving collaborative machine learning. In: Chen, Liqun, Li, Ninghui, Liang, Kaitai, Schneider, Steve (eds.) ESORICS 2020. LNCS, vol. 12308, pp. 399–418. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58951-6_20

    Chapter  Google Scholar 

  31. Zhang, Y., Bai, G., Zhong, M., Li, X., Ko, R.: Differentially private collaborative coupling learning for recommender systems. IEEE Intell. Syst. 36(1), 16–24 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanjun Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xie, F., Zhang, Y., Wei, H., Bai, G. (2022). UQ-AAS21: A Comprehensive Dataset of Amazon Alexa Skills. In: Li, B., et al. Advanced Data Mining and Applications. ADMA 2022. Lecture Notes in Computer Science(), vol 13087. Springer, Cham. https://doi.org/10.1007/978-3-030-95405-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95405-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95404-8

  • Online ISBN: 978-3-030-95405-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics