Structuring Unstructured Data—Or: How Machine Learning Can Make You a Wine Sommelier

Müller, Oliver

doi:10.1007/978-3-030-06234-7_29

Structuring Unstructured Data—Or: How Machine Learning Can Make You a Wine Sommelier

Oliver Müller⁴

Chapter
First Online: 26 January 2019

1949 Accesses

Abstract

Textual data, for example in the form of e-mails, instant messages, or social media posts, is ubiquitous today. As textual data typically comes in unstructured formats and is often ambiguous in meaning, it is difficult to analyze it using computational tools. However, advances in machine learning and the increasing availability of training data make it now possible to extract useful knowledge from large amounts of unstructured textual data. In this chapter, we showcase the use of unsupervised machine learning algorithms and visualization techniques to bring structure to—and thereby learn from—more than 100,000 professional wine reviews. Something that could be useful, for example, when choosing suitable wines for the celebration of your 60th birthday.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
All data analysis steps in the following analysis were performed in Python (mainly using the spaCy, NLTK, genism, and scikit-learn packages) and all visualizations were created with Tableau.
2.
The dataset can be downloaded at https://www.kaggle.com/zynicide/wine-reviews.
3.
For a discussion of the optimal number of topics see, e.g., Debortoli, Müller, Junglas, & vom Brocke (2016) or Schmiedel, Müller, & vom Brocke (2018).

References

Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Article Google Scholar
Blei, D., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Google Scholar
BrightLocal. (2014). Local consumer review survey 2014. Retrieved September 19, 2018, from https://www.brightlocal.com/learn/local-consumer-review-survey-2014/.
Debortoli, S., Müller, O., Junglas, I., & vom Brocke, J. (2016). Text mining for information systems researchers: An annotated topic modeling tutorial. Communications of the Association for Information Systems, 39(1).
Article Google Scholar
Dhar, V. (2013). Data science and prediction. Communications of the ACM, 56(12), 64–73.
Article Google Scholar
Fan, W., Wallace, L., Rich, S., & Zhang, Z. (2006). Tapping the power of text mining. Communications of the ACM, 49(9), 76–82.
Article Google Scholar
Frawley, W. J., Piatetsky-Shapiro, G., & Matheus, C. J. (1992). Knowledge discovery in databases: An overview. AI Magazine, 13(3), 57–70.
Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2013). The elements of statistical learning. New York: Springer.
Google Scholar
Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8–12.
Article Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2013). The elements of statistical learning. New York: Springer.
Google Scholar
IDC. (2014). The 2014 digital universe study. Retrieved September 19, 2018, from http://www.emc.com/leadership/digital-universe/index.htm#2014.
Jurafsky, D., & Martin, J. H. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson.
Google Scholar
Mudambi, S. M., & Schuff, D. (2010). What makes a helpful online review? A study of customer reviews on Amazon.com. MIS Quarterly, 34(1), 185–200.
Article Google Scholar
Schmiedel, T., Müller, O., & vom Brocke, J. (2018). Topic modeling as a strategy of inquiry in organizational research: A tutorial with an application example on organizational culture. Organizational Research Methods.
Google Scholar
Statista. (2017). Number of user reviews and opinions on TripAdvisor worldwide from 2014 to 2017. Retrieved September 19, 2018, from https://www.statista.com/statistics/684862/tripadvisor-number-of-reviews/.

Download references

Author information

Authors and Affiliations

Paderborn University, Paderborn, Germany
Oliver Müller

Authors

Oliver Müller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oliver Müller .

Editor information

Editors and Affiliations

Department of Information Systems, University of Münster, Münster, Germany
Katrin Bergener
Department of Information Systems, University of Münster, Münster, Germany
Michael Räckers
Department of Information Systems, University of Münster, Münster, Germany
Armin Stein

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Müller, O. (2019). Structuring Unstructured Data—Or: How Machine Learning Can Make You a Wine Sommelier. In: Bergener, K., Räckers, M., Stein, A. (eds) The Art of Structuring. Springer, Cham. https://doi.org/10.1007/978-3-030-06234-7_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-06234-7_29
Published: 26 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06233-0
Online ISBN: 978-3-030-06234-7
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics