Skip to main content

A Lightweight Interactive Regular Expression Generator for Qualitative Coding in Quantitative Ethnography

  • Conference paper
  • First Online:
Advances in Quantitative Ethnography (ICQE 2023)

Abstract

Quantitative ethnography approaches are often used to analyze large scale qualitative data. Manually coding such data is expensive and time consuming, if not impractical or impossible. In contrast, machine learning algorithms can code virtually unlimited amounts of data once a model has been created. However, machine learning approaches lack transparency and rely on large amount of training data. An alternative automated coding approach using regular expressions has the advantage of minimizing required training data while providing transparency. However, manually creating regular expressions during the coding process can be a very challenging task for many researchers. One potential solution to this challenge is automatic regular expression generation. Unfortunately, existing algorithms are all based on large pre-coded training data which is often unavailable in quantitative ethnography tasks. In this paper, we present a lightweight and interactive algorithm that actively constructs regular expression-based coding classifiers with the researcher. We use a simulation on an education data to show that the proposed algorithm is promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Active learning of regular expressions for entity extraction. IEEE Trans. Cybern. 48(3), 1067–1080 (2018). https://doi.org/10.1109/TCYB.2017.2680466. http://ieeexplore.ieee.org/document/7886274/

  2. Cai, Z., Eagan, B., Marquart, C., Shaffer, D.W.: LSTM neural network assisted regex development for qualitative coding. In: Damşa, C., Barany, A. (eds.) ICQE 2022. CCIS, vol. 1785, pp. 17–29. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-31726-2_2

    Chapter  Google Scholar 

  3. Cai, Z., Marquart, C., Shaffer, D.: Neural recall network: a neural network solution to low recall problem in regex-based qualitative coding. In: Mitrovic, A., Bosch, N. (eds.) Proceedings of the 15th International Conference on Educational Data Mining, Durham, United Kingdom, pp. 228–238. International Educational Data Mining Society (2022). https://doi.org/10.5281/zenodo.6853047

  4. Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W.: Using topic modeling for code discovery in large scale text data. In: Ruis, A.R., Lee, S.B. (eds.) ICQE 2020. CCIS, vol. 1312, pp. 18–31. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67788-6_2

    Chapter  Google Scholar 

  5. Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W., Xiangen, Hu., Graesser, A.C.: nCoder+: a semantic tool for improving recall of ncoder coding. In: Eagan, B., Misfeldt, M., Siebert-Evenstone, A. (eds.) ICQE 2019. CCIS, vol. 1112, pp. 41–54. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33232-7_4

    Chapter  Google Scholar 

  6. Charmaz, K.: Constructing Grounded Theory. Sage, London (2006)

    Google Scholar 

  7. Chen, N.C., Drouhard, M., Kocielnik, R., Suh, J., Aragon, C.R.: Using machine learning to support qualitative coding in social science: shifting the focus to ambiguity. ACM Trans. Interact. Intell. Syst. 8(2), 9:1–9:20 (2018). https://doi.org/10.1145/3185515

  8. Chesler, N., Ruis, A., Collier, W., Swiecki, Z., Arastoopour, G., Shaffer, D.: A novel paradigm for engineering education: virtual internships with individualized mentoring and assessment of engineering thinking. J. Biomech. Eng. 137(2), 1–8 (2015)

    Article  Google Scholar 

  9. Gautam, D., Swiecki, Z., Shaffer, D.W., Graesser, A.C., Rus, V.: Modeling classifiers for virtual internships without participant data. In: Proceedings of the 10th International Conference on Educational Data Mining, pp. 278–283 (2017)

    Google Scholar 

  10. Glaser, B., Strauss, A.: The discovery of grounded theory: stretegies for qualitative research. Aldine, Chicago (1967)

    Google Scholar 

  11. Li, X., Cui, M., Li, J., Bai, R., Lu, Z., Aickelin, U.: A hybrid medical text classification framework: integrating attentive rule construction and neural network. Neurocomputing 443, 345–355 (2021). https://doi.org/10.1016/j.neucom.2021.02.069. https://linkinghub.elsevier.com/retrieve/pii/S0925231221003258

  12. Shaffer, D.: Quantitative Ethnography. Cathcart Press, Madison (2017)

    Google Scholar 

  13. Shaffer, D.W., Ruis, A.R.: How we code. In: Ruis, A.R., Lee, S.B. (eds.) ICQE 2020. CCIS, vol. 1312, pp. 62–77. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67788-6_5

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was funded in part by the National Science Foundation (DRL-2100320, DRL-2201723, DRL-2225240), the Wisconsin Alumni Research Foundation, and the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison. The opinions, findings, and conclusions do not reflect the views of the funding agencies, cooperating institutions, or other individuals.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiqiang Cai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cai, Z., Marquart, C., Eagan, B., Xiao, Y., Williamson Shaffer, D. (2023). A Lightweight Interactive Regular Expression Generator for Qualitative Coding in Quantitative Ethnography. In: Arastoopour Irgens, G., Knight, S. (eds) Advances in Quantitative Ethnography. ICQE 2023. Communications in Computer and Information Science, vol 1895. Springer, Cham. https://doi.org/10.1007/978-3-031-47014-1_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47014-1_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47013-4

  • Online ISBN: 978-3-031-47014-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics