Abstract
Textual data, for example in the form of e-mails, instant messages, or social media posts, is ubiquitous today. As textual data typically comes in unstructured formats and is often ambiguous in meaning, it is difficult to analyze it using computational tools. However, advances in machine learning and the increasing availability of training data make it now possible to extract useful knowledge from large amounts of unstructured textual data. In this chapter, we showcase the use of unsupervised machine learning algorithms and visualization techniques to bring structure to—and thereby learn from—more than 100,000 professional wine reviews. Something that could be useful, for example, when choosing suitable wines for the celebration of your 60th birthday.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
All data analysis steps in the following analysis were performed in Python (mainly using the spaCy, NLTK, genism, and scikit-learn packages) and all visualizations were created with Tableau.
- 2.
The dataset can be downloaded at https://www.kaggle.com/zynicide/wine-reviews.
- 3.
References
Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Blei, D., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
BrightLocal. (2014). Local consumer review survey 2014. Retrieved September 19, 2018, from https://www.brightlocal.com/learn/local-consumer-review-survey-2014/.
Debortoli, S., Müller, O., Junglas, I., & vom Brocke, J. (2016). Text mining for information systems researchers: An annotated topic modeling tutorial. Communications of the Association for Information Systems, 39(1).
Dhar, V. (2013). Data science and prediction. Communications of the ACM, 56(12), 64–73.
Fan, W., Wallace, L., Rich, S., & Zhang, Z. (2006). Tapping the power of text mining. Communications of the ACM, 49(9), 76–82.
Frawley, W. J., Piatetsky-Shapiro, G., & Matheus, C. J. (1992). Knowledge discovery in databases: An overview. AI Magazine, 13(3), 57–70.
Friedman, J., Hastie, T., & Tibshirani, R. (2013). The elements of statistical learning. New York: Springer.
Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8–12.
Hastie, T., Tibshirani, R., & Friedman, J. (2013). The elements of statistical learning. New York: Springer.
IDC. (2014). The 2014 digital universe study. Retrieved September 19, 2018, from http://www.emc.com/leadership/digital-universe/index.htm#2014.
Jurafsky, D., & Martin, J. H. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson.
Mudambi, S. M., & Schuff, D. (2010). What makes a helpful online review? A study of customer reviews on Amazon.com. MIS Quarterly, 34(1), 185–200.
Schmiedel, T., Müller, O., & vom Brocke, J. (2018). Topic modeling as a strategy of inquiry in organizational research: A tutorial with an application example on organizational culture. Organizational Research Methods.
Statista. (2017). Number of user reviews and opinions on TripAdvisor worldwide from 2014 to 2017. Retrieved September 19, 2018, from https://www.statista.com/statistics/684862/tripadvisor-number-of-reviews/.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Müller, O. (2019). Structuring Unstructured Data—Or: How Machine Learning Can Make You a Wine Sommelier. In: Bergener, K., Räckers, M., Stein, A. (eds) The Art of Structuring. Springer, Cham. https://doi.org/10.1007/978-3-030-06234-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-06234-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06233-0
Online ISBN: 978-3-030-06234-7
eBook Packages: Business and ManagementBusiness and Management (R0)