Abstract
This study presents an artifact for qualitative coding of large volumes of text data using automated text mining techniques. Coding is a critical component of qualitative research, where the “gold standard” involves human coders manually assigning codes to text fragments based on their subjective judgment. However, human coding is not scalable to large corpora of text with millions of large documents. Our proposed method extends the latest advancements in semantic text similarity using sentence transformers to automate qualitative coding of text for predefined constructs with known operationalizations using cosine similarity scores between individual sentences in the text documents and construct operationalizations. We illustrate our approach by coding corporate 10-K reports from US SEC filings for two organizational innovation processes: exploration and exploitation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gibbs, G.R.: Analyzing Qualitative Data, 6th edn. Sage, Thousand Oaks (2007)
Müller, O., Junglas, I., Brocke, J.V., Debortoli, S.: Utilizing big data analytics for information systems research: challenges, promises and guidelines. Eur. J. Inf. Syst. 25, 289–302 (2016). https://doi.org/10.1057/ejis.2016.2
Kobayashi, V.B., Mol, S.T., Berkers, H.A., Kismihók, G., Den Hartog, D.N.: Text Mining in Organizational Research. Org. Res. Methods 21, 733–765 (2018). https://doi.org/10.1177/1094428117722619
Janasik, N., Honkela, T., Bruun, H.: Text Mining in Qualitative Research. Organ. Res. Methods. 12, 436–460 (2009). https://doi.org/10.1177/1094428108317202
Abbasi, A., Zhang, Z., Zimbra, D., Chen, H., Nunamaker, J.F.: Detecting fake websites: the contribution of statistical learning theory. Manag. Inf. Syst. Quart. 34, 435–461 (2010). https://doi.org/10.2307/25750686
Nam, K.H., Seong, N.Y.: Financial news-based stock movement prediction using causality analysis of influence in the Korean stock market. Decis. Support Syst. 117, 100–112 (2019). https://doi.org/10.1016/j.dss.2018.11.004
Colladon, A.F., Guardabascio, B., Innarella, R.: Using social network and semantic analysis to analyze online travel forums and forecast tourism demand. Decis. Support Syst. 123, 113075 (2019). https://doi.org/10.1016/j.dss.2019.113075
Wang, X., et al.: Mining user-generated content in an online smoking cessation community to identify smoking status: a machine learning approach. Decis. Support Syst. 116, 26–34 (2019). https://doi.org/10.1016/j.dss.2018.10.005
Liu, X., Alan Wang, G., Fan, W., Zhang, Z.: Finding useful solutions in online knowledge communities: a theory-driven design and multilevel analysis. Inf. Syst. Res. 31, 731–752 (2020). https://doi.org/10.1287/ISRE.2019.0911
Chatterjee, S.: Explaining customer ratings and recommendations by combining qualitative and quantitative user generated contents. Decis. Support Syst. 119, 14–22 (2019). https://doi.org/10.1016/j.dss.2019.02.008
Xu, X.: What are customers commenting on, and how is their satisfaction affected? examining online reviews in the on-demand food service context. Decis. Support Syst. 142, 113467 (2021). https://doi.org/10.1016/j.dss.2020.113467
Hu, N., Bose, I., Koh, N.S., Liu, L.: Manipulation of online reviews: an analysis of ratings, readability, and sentiments. Decis. Support Syst. 52, 674–684 (2012). https://doi.org/10.1016/j.dss.2011.11.002
Hwang, E.H., Singh, P.V., Argote, L.: Jack of all, master of some: information network and innovation in crowdsourcing communities. Inf. Syst. Res. 30, 389–410 (2019). https://doi.org/10.1287/isre.2018.0804
Pan, Y., Huang, P., Gopal, A.: Storm clouds on the horizon? New entry threats and R & D investments in the U.S. IT industry. Inf. Syst. Res. 30, 540–562 (2019). https://doi.org/10.1287/isre.2018.0816.
Zhang, T., Liu, F.C., Gao, B., Yen, D.: Top management team social interaction and conservative reporting decision: a language style matching approach. Decis. Support Syst. 142, 113469 (2021). https://doi.org/10.1016/j.dss.2020.113469
Wu, J., Cai, J., Luo, X.R., Benitez, J.: How to increase customer repeated bookings in the short-term room rental market? a large-scale granular data investigation. Decis. Support Syst. 143, 113495 (2021). https://doi.org/10.1016/j.dss.2021.113495.
Cao, Q., Duan, W., Gan, Q.: Exploring determinants of voting for the “helpfulness” of online user reviews: a text mining approach. Decis. Support Syst. 50, 511–521 (2011). https://doi.org/10.1016/j.dss.2010.11.009
Goes, P., Lin, M., Yeung, Ching-man Au.: “Popularity Effect” in user-generated content: evidence from online product reviews. Inf. Syst. Res. 25(2), 222–238 (2014). https://doi.org/10.1287/isre.2013.0512
Singh, P.V., Sahoo, N., Mukhopadhyay, T.: How to attract and retain readers in enterprise blogging? Inf. Syst. Res. 25, 35–52 (2014). https://doi.org/10.1287/isre.2013.0509
Zhang, L., Yan, Q., Zhang, L.: A text analytics framework for understanding the relationships among host self-description, trust perception and purchase behavior on Airbnb. Decis. Support Syst. 133, 113288 (2020). https://doi.org/10.1016/j.dss.2020.113288
March, J.G.: Exploration and Exploitation in Organizational Learning. Organ. Sci. 2, 71–87 (1991)
Lewin, A.Y., Long, C.P., Carroll, T.N.: The coevolution of new organizational forms. Organ. Sci. 10, 535–550 (1999). https://doi.org/10.1287/orsc.10.5.535
Tushman, M.L., O’Reilly, C.A.: Ambidextrous organizations: managing evolutionary and revolutionary change. Calif. Manage. Rev. 38, 8–29 (1996). https://doi.org/10.2307/41165852
He, Z-L., Wong, P-K.: Exploration vs. exploitation: an empirical test of the ambidexterity hypothesis. Organ Sci. 15(4), 481–494 (2004). https://doi.org/10.1287/orsc.1040.0078
Vaswani, A., et al.: Attention is all you need. In: 31st Conference Neural Information Processing System (2017). https://doi.org/10.1109/2943.974352.
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: 2019 Conference Empirical Methods Natural Language Processing 9th International Jt. Conference Natural Language Processing Proceedings Conference, pp. 3982–3992 (2020). https://doi.org/10.18653/v1/d19-1410.
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: 2019 Conference North American Chapter Association Computer Linguistics Human Language Technology - Proceedings Conference, vol. 1, pp. 4171–4186 (2019)
Liu, Y., et al.: RoBERTa: A robustly optimized BERT pretraining approach. arXiv (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
De Oliveira Silveira, A., Bhattacherjee, A. (2021). An Unsupervised Algorithm for Qualitative Coding of Text Data: Artifact Design, Application, and Evaluation. In: Chandra Kruse, L., Seidel, S., Hausvik, G.I. (eds) The Next Wave of Sociotechnical Design. DESRIST 2021. Lecture Notes in Computer Science(), vol 12807. Springer, Cham. https://doi.org/10.1007/978-3-030-82405-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-82405-1_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82404-4
Online ISBN: 978-3-030-82405-1
eBook Packages: Computer ScienceComputer Science (R0)