Skip to main content

Does Active Learning Reduce Human Coding?: A Systematic Comparison of Neural Network with nCoder

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1785))

Abstract

In quantitative ethnography (QE) studies which often involve large datasets that cannot be entirely hand-coded by human raters, researchers have used supervised machine learning approaches to develop automated classifiers. However, QE researchers are rightly concerned with the amount of human coding that may be required to develop classifiers that achieve the high levels of accuracy that QE studies typically require. In this study, we compare a neural network, a powerful traditional supervised learning approach, with nCoder, an active learning technique commonly used in QE studies, to determine which technique requires the least human coding to produce a sufficiently accurate classifier. To do this, we constructed multiple training sets from a large dataset used in prior QE studies and designed a Monte Carlo simulation to test the performance of the two techniques systematically. Our results show that nCoder can achieve high predictive accuracy with significantly less human-coded data than a neural network.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Arastoopour, G., et al.: Teaching and assessing engineering design thinking with virtual internships and epistemic network analysis. Int. J. Eng. Educ. 32(3), 1492–1501 (2016)

    Google Scholar 

  2. Bakharia, A.: On the equivalence of inductive content analysis and topic modeling. In: Eagan, B., Misfeldt, M., Siebert-Evenstone, A. (eds.) ICQE 2019. CCIS, vol. 1112, pp. 291–298. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33232-7_25

    Chapter  Google Scholar 

  3. Baradwaj, B.K., Pal, S.: Mining educational data to analyze students’ performance. ArXiv Prepr. ArXiv12013417 (2012)

    Google Scholar 

  4. Bull, L., et al.: Active learning for semi-supervised structural health monitoring. J. Sound Vib. 437, 373–388 (2018)

    Article  Google Scholar 

  5. Cai, Z., et al.: Neural recall network: A neural network solution to low recall problem in regex-based qualitative coding. In: Proceedings of the 15th International Conference on Educational Data Mining (2022)

    Google Scholar 

  6. Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W.: Using topic modeling for code discovery in large scale text data. In: Ruis, A.R., Lee, S.B. (eds.) ICQE 2021. CCIS, vol. 1312, pp. 18–31. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67788-6_2

    Chapter  Google Scholar 

  7. Chesler, N.C., et al.: A novel paradigm for engineering education: virtual internships with individualized mentoring and assessment of engineering thinking. J. Biomech. Eng. 137, 2, 024701 (2015). https://doi.org/10.1115/1.4029235

  8. Cho, J., et al.: How much data is needed to train a medical image deep learning system to achieve necessary high accuracy? ArXiv Prepr. ArXiv151106348 (2015)

    Google Scholar 

  9. Eagan, B.R., et al.: Can We Rely on IRR? Testing the Assumptions of Inter-Rater Reliability, vol. 4 (2017)

    Google Scholar 

  10. González-Carvajal, S., Garrido-Merchán, E.C.: Comparing BERT against traditional machine learning text classification (2021). http://arxiv.org/abs/2005.13012

  11. Goudjil, M., Koudil, M., Bedda, M., Ghoggali, N.: a novel active learning method using SVM for text classification. Int. J. Autom. Comput. 15(3), 290–298 (2016). https://doi.org/10.1007/s11633-015-0912-z

    Article  Google Scholar 

  12. Hartmann, J., et al.: Comparing automated text classification methods. Int. J. Res. Mark. 36(1), 20–38 (2019)

    Article  Google Scholar 

  13. Harwell, M.R.: Summarizing Monte Carlo results in methodological research. J. Educ. Stat. 17(4), 297–313 (1992)

    Article  Google Scholar 

  14. Hernández-Blanco, A., et al.: A systematic review of deep learning approaches to educational data mining. Complexity 2019 (2019)

    Google Scholar 

  15. Holton, J.A.: The coding process and its challenges. Sage Handb. Grounded Theory. 3, 265–289 (2007)

    Article  Google Scholar 

  16. Jelodar, H., et al.: Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools Appl. 78(11), 15169–15211 (2018). https://doi.org/10.1007/s11042-018-6894-4

    Article  Google Scholar 

  17. Khandkar, S.H.: Open coding. Univ. Calg. 23, 2009 (2009)

    Google Scholar 

  18. Larson, S., Popov, V., Ali, A.M., Ramanathan, P., Jung, S.: Healthcare professionals’ perceptions of telehealth: analysis of tweets from pre- and during the COVID-19 pandemic. In: Ruis, A.R., Lee, S.B. (eds.) ICQE 2021. CCIS, vol. 1312, pp. 390–405. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67788-6_27

    Chapter  Google Scholar 

  19. Miles, M.B., Huberman, A.M.: Qualitative data analysis: an expanded sourcebook. Sage (1994)

    Google Scholar 

  20. Ramezan, C.A., et al.: Effects of training set size on supervised machine-learning land-cover classification of large-area high-resolution remotely sensed data. Remote Sens. 13, 3, 368 (2021)

    Google Scholar 

  21. Scott, C., Medaugh, M.: Axial coding. Int. Encycl. Commun. Res. Methods. 10, 9781118901731 (2017)

    Google Scholar 

  22. Settles, B.: Active Learning Literature Survey 47

    Google Scholar 

  23. Shaffer, D.W., Ruis, A.R.: How we code. In: Ruis, A.R., Lee, S.B. (eds.) ICQE 2021. CCIS, vol. 1312, pp. 62–77. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67788-6_5

    Chapter  Google Scholar 

  24. Yu, D., et al.: Active learning and semi-supervised learning for speech recognition: a unified framework using the global entropy reduction maximization criterion. Comput. Speech Lang. 24(3), 433–444 (2010). https://doi.org/10.1016/j.csl.2009.03.004

    Article  Google Scholar 

  25. Prodigy · An annotation tool for AI, Machine Learning & NLP. https://prodi.gy. Accessed 23 May 2022

Download references

Acknowledgements

This work was funded in part by the National Science Foundation (DRL-1661036, DRL-1713110, DRL-2100320), the Wisconsin Alumni Research Foundation, and the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison. The opinions, findings, and conclusions do not reflect the views of the funding agencies, cooperating institutions, or other individuals.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaeyoon Choi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Choi, J., Ruis, A.R., Cai, Z., Eagan, B., Shaffer, D.W. (2023). Does Active Learning Reduce Human Coding?: A Systematic Comparison of Neural Network with nCoder. In: Damşa, C., Barany, A. (eds) Advances in Quantitative Ethnography. ICQE 2022. Communications in Computer and Information Science, vol 1785. Springer, Cham. https://doi.org/10.1007/978-3-031-31726-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31726-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31725-5

  • Online ISBN: 978-3-031-31726-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics