Skip to main content

An Unsupervised Algorithm for Qualitative Coding of Text Data: Artifact Design, Application, and Evaluation

  • Conference paper
  • First Online:
The Next Wave of Sociotechnical Design (DESRIST 2021)

Abstract

This study presents an artifact for qualitative coding of large volumes of text data using automated text mining techniques. Coding is a critical component of qualitative research, where the “gold standard” involves human coders manually assigning codes to text fragments based on their subjective judgment. However, human coding is not scalable to large corpora of text with millions of large documents. Our proposed method extends the latest advancements in semantic text similarity using sentence transformers to automate qualitative coding of text for predefined constructs with known operationalizations using cosine similarity scores between individual sentences in the text documents and construct operationalizations. We illustrate our approach by coding corporate 10-K reports from US SEC filings for two organizational innovation processes: exploration and exploitation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gibbs, G.R.: Analyzing Qualitative Data, 6th edn. Sage, Thousand Oaks (2007)

    Book  Google Scholar 

  2. Müller, O., Junglas, I., Brocke, J.V., Debortoli, S.: Utilizing big data analytics for information systems research: challenges, promises and guidelines. Eur. J. Inf. Syst. 25, 289–302 (2016). https://doi.org/10.1057/ejis.2016.2

    Article  Google Scholar 

  3. Kobayashi, V.B., Mol, S.T., Berkers, H.A., Kismihók, G., Den Hartog, D.N.: Text Mining in Organizational Research. Org. Res. Methods 21, 733–765 (2018). https://doi.org/10.1177/1094428117722619

  4. Janasik, N., Honkela, T., Bruun, H.: Text Mining in Qualitative Research. Organ. Res. Methods. 12, 436–460 (2009). https://doi.org/10.1177/1094428108317202

    Article  Google Scholar 

  5. Abbasi, A., Zhang, Z., Zimbra, D., Chen, H., Nunamaker, J.F.: Detecting fake websites: the contribution of statistical learning theory. Manag. Inf. Syst. Quart. 34, 435–461 (2010). https://doi.org/10.2307/25750686

  6. Nam, K.H., Seong, N.Y.: Financial news-based stock movement prediction using causality analysis of influence in the Korean stock market. Decis. Support Syst. 117, 100–112 (2019). https://doi.org/10.1016/j.dss.2018.11.004

    Article  Google Scholar 

  7. Colladon, A.F., Guardabascio, B., Innarella, R.: Using social network and semantic analysis to analyze online travel forums and forecast tourism demand. Decis. Support Syst. 123, 113075 (2019). https://doi.org/10.1016/j.dss.2019.113075

    Article  Google Scholar 

  8. Wang, X., et al.: Mining user-generated content in an online smoking cessation community to identify smoking status: a machine learning approach. Decis. Support Syst. 116, 26–34 (2019). https://doi.org/10.1016/j.dss.2018.10.005

    Article  Google Scholar 

  9. Liu, X., Alan Wang, G., Fan, W., Zhang, Z.: Finding useful solutions in online knowledge communities: a theory-driven design and multilevel analysis. Inf. Syst. Res. 31, 731–752 (2020). https://doi.org/10.1287/ISRE.2019.0911

    Article  Google Scholar 

  10. Chatterjee, S.: Explaining customer ratings and recommendations by combining qualitative and quantitative user generated contents. Decis. Support Syst. 119, 14–22 (2019). https://doi.org/10.1016/j.dss.2019.02.008

    Article  Google Scholar 

  11. Xu, X.: What are customers commenting on, and how is their satisfaction affected? examining online reviews in the on-demand food service context. Decis. Support Syst. 142, 113467 (2021). https://doi.org/10.1016/j.dss.2020.113467

    Article  Google Scholar 

  12. Hu, N., Bose, I., Koh, N.S., Liu, L.: Manipulation of online reviews: an analysis of ratings, readability, and sentiments. Decis. Support Syst. 52, 674–684 (2012). https://doi.org/10.1016/j.dss.2011.11.002

    Article  Google Scholar 

  13. Hwang, E.H., Singh, P.V., Argote, L.: Jack of all, master of some: information network and innovation in crowdsourcing communities. Inf. Syst. Res. 30, 389–410 (2019). https://doi.org/10.1287/isre.2018.0804

    Article  Google Scholar 

  14. Pan, Y., Huang, P., Gopal, A.: Storm clouds on the horizon? New entry threats and R & D investments in the U.S. IT industry. Inf. Syst. Res. 30, 540–562 (2019). https://doi.org/10.1287/isre.2018.0816.

  15. Zhang, T., Liu, F.C., Gao, B., Yen, D.: Top management team social interaction and conservative reporting decision: a language style matching approach. Decis. Support Syst. 142, 113469 (2021). https://doi.org/10.1016/j.dss.2020.113469

    Article  Google Scholar 

  16. Wu, J., Cai, J., Luo, X.R., Benitez, J.: How to increase customer repeated bookings in the short-term room rental market? a large-scale granular data investigation. Decis. Support Syst. 143, 113495 (2021). https://doi.org/10.1016/j.dss.2021.113495.

  17. Cao, Q., Duan, W., Gan, Q.: Exploring determinants of voting for the “helpfulness” of online user reviews: a text mining approach. Decis. Support Syst. 50, 511–521 (2011). https://doi.org/10.1016/j.dss.2010.11.009

  18. Goes, P., Lin, M., Yeung, Ching-man Au.: “Popularity Effect” in user-generated content: evidence from online product reviews. Inf. Syst. Res. 25(2), 222–238 (2014). https://doi.org/10.1287/isre.2013.0512

    Article  Google Scholar 

  19. Singh, P.V., Sahoo, N., Mukhopadhyay, T.: How to attract and retain readers in enterprise blogging? Inf. Syst. Res. 25, 35–52 (2014). https://doi.org/10.1287/isre.2013.0509

    Article  Google Scholar 

  20. Zhang, L., Yan, Q., Zhang, L.: A text analytics framework for understanding the relationships among host self-description, trust perception and purchase behavior on Airbnb. Decis. Support Syst. 133, 113288 (2020). https://doi.org/10.1016/j.dss.2020.113288

    Article  Google Scholar 

  21. March, J.G.: Exploration and Exploitation in Organizational Learning. Organ. Sci. 2, 71–87 (1991)

    Google Scholar 

  22. Lewin, A.Y., Long, C.P., Carroll, T.N.: The coevolution of new organizational forms. Organ. Sci. 10, 535–550 (1999). https://doi.org/10.1287/orsc.10.5.535

    Article  Google Scholar 

  23. Tushman, M.L., O’Reilly, C.A.: Ambidextrous organizations: managing evolutionary and revolutionary change. Calif. Manage. Rev. 38, 8–29 (1996). https://doi.org/10.2307/41165852

    Article  Google Scholar 

  24. He, Z-L., Wong, P-K.: Exploration vs. exploitation: an empirical test of the ambidexterity hypothesis. Organ Sci. 15(4), 481–494 (2004). https://doi.org/10.1287/orsc.1040.0078

    Article  Google Scholar 

  25. Vaswani, A., et al.: Attention is all you need. In: 31st Conference Neural Information Processing System (2017). https://doi.org/10.1109/2943.974352.

  26. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: 2019 Conference Empirical Methods Natural Language Processing 9th International Jt. Conference Natural Language Processing Proceedings Conference, pp. 3982–3992 (2020). https://doi.org/10.18653/v1/d19-1410.

  27. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: 2019 Conference North American Chapter Association Computer Linguistics Human Language Technology - Proceedings Conference, vol. 1, pp. 4171–4186 (2019)

    Google Scholar 

  28. Liu, Y., et al.: RoBERTa: A robustly optimized BERT pretraining approach. arXiv (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anol Bhattacherjee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

De Oliveira Silveira, A., Bhattacherjee, A. (2021). An Unsupervised Algorithm for Qualitative Coding of Text Data: Artifact Design, Application, and Evaluation. In: Chandra Kruse, L., Seidel, S., Hausvik, G.I. (eds) The Next Wave of Sociotechnical Design. DESRIST 2021. Lecture Notes in Computer Science(), vol 12807. Springer, Cham. https://doi.org/10.1007/978-3-030-82405-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-82405-1_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-82404-4

  • Online ISBN: 978-3-030-82405-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics