Skip to main content

LSTM Neural Network Assisted Regex Development for Qualitative Coding

  • Conference paper
  • First Online:
Advances in Quantitative Ethnography (ICQE 2022)

Abstract

Regular expression (regex) based automated qualitative coding helps reduce researchers’ effort in manually coding text data, without sacrificing transparency of the coding process. However, researchers using regex based approaches struggle with low recall or high false negative rate during classifier development. Advanced natural language processing techniques, such as topic modeling, latent semantic analysis and neural network classification models help solve this problem in various ways. The latest advance in this direction is the discovery of the so called “negative reversion set (NRS)”, in which false negative items appear more frequently than in the negative set. This helps regex classifier developers more quickly identify missing items and thus improve classification recall. This paper simulates the use of NRS in real coding scenarios and compares the required manual coding items between NRS sampling and random sampling in the process of classifier refinement. The result using one data set with 50,818 items and six associated qualitative codes shows that, on average, using NRS sampling, the required manual coding size could be reduced by 50% to 63%, comparing with random sampling.

Supported by Natural Science Foundation

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bai, X.: Text classification based on LSTM and attention. In: Thirteenth International Conference on Digital Information Management (ICDIM), pp. 29–32 (2018)

    Google Scholar 

  2. Cai, Z., Marquart, C., Shaffer, D.: Neural recall network: a neural network solutionto low recall problem in regex-based qualitative coding. In: Mitrovic, A., Bosch, N. (eds.) Proceedings of the 15th International Conference on Educational Data Mining, pp. 228–238. International Educational Data Mining Society, Durham, United Kingdom (2022). https://doi.org/10.5281/zenodo.6853047

  3. Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W., Hu, X., Graesser, A.C.: ncoder+: a semantic tool for improving recall of ncoder coding. In: Advances in Quantitative Ethnography: ICQE Conference Proceedings. pp. 52–65 (2019)

    Google Scholar 

  4. Chen, N.C., Drouhard, M., Kocielnik, R., Suh, J., Aragon, C.R.: Using machine learning to support qualitative coding in social science: Shifting the focus to ambiguity. ACM Trans. Interact. Intell. Syst. 8(2), 9:1–9:20 (2018). https://doi.org/10.1145/3185515, https://doi.org/10.1145/3185515

  5. Chesler, N., Ruis, A., Collier, W., Swiecki, Z., Arastoopour, G., Shaffer, D.: Anovel paradigm for engineering education: virtual internships with individualized mentoring and assessment of engineering thinking. J. Biomech. Eng. 137(2), 1–8 (2015)

    Article  Google Scholar 

  6. Eagan, B., Brohinsky, J., Wang, J., Shaffer, D.: Testing the reliability of interrater reliability. In: Proceedings of the Tenth International Conference on Learning Analytics and Knowledge, pp. 454–461 (2020)

    Google Scholar 

  7. Eagan, B., Swiecki, Z., Farrell, C., Shaffer, D.: The binary replicate test: Determining the sensitivity of CSCL models to coding error. In: Proceedings of the 13th International Conference on Computer Supported Collaborative Learning (CSCL), pp. 328–335 (2019)

    Google Scholar 

  8. Gautam, D., Swiecki, Z., Shaffer, D.W., Graesser, A.C., Rus, V.: Modeling classifiers for virtual internships without participant data. In: Proceedings of the 10th International Conference on Educational Data Mining, pp. 278–283 (2017)

    Google Scholar 

  9. Georgieva-Trifonova, T., Duraku, M.: Research on n-grams feature selection methods for text classification. In: IOP Conference Series: Materials Science and Engineering, vol. 1031, p. 012048. IOP Publishing (2021)

    Google Scholar 

  10. Glaser, B., Strauss, A.: The Discovery of Grounded Theory: Strategies For Qualitative Research. Aldine, Chicago (1967)

    Google Scholar 

  11. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)

    Article  Google Scholar 

  12. Shaffer, D.: Quantitative Ethnography. Cathcart Press, Madison, WI (2017)

    Google Scholar 

  13. Shaffer, D.W., Ruis, A.R.: How we code. In: Advances in Quantitative Ethnography: ICQE Conference Proceedings, pp. 62–77 (2021)

    Google Scholar 

Download references

Acknowledgements

This work was funded in part by the National Science Foundation (DRL-1661036, DRL-1713110, DRL-2100320, LDI-1934745), the Wisconsin Alumni Research Foundation, and the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison. The opinions, findings, and conclusions do not reflect the views of the funding agencies, cooperating institutions, or other individuals.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiqiang Cai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cai, Z., Eagan, B., Marquart, C., Shaffer, D.W. (2023). LSTM Neural Network Assisted Regex Development for Qualitative Coding. In: Damşa, C., Barany, A. (eds) Advances in Quantitative Ethnography. ICQE 2022. Communications in Computer and Information Science, vol 1785. Springer, Cham. https://doi.org/10.1007/978-3-031-31726-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31726-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31725-5

  • Online ISBN: 978-3-031-31726-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics