Skip to main content

Use of Training, Validation, and Test Sets for Developing Automated Classifiers in Quantitative Ethnography

  • Conference paper
  • First Online:
Advances in Quantitative Ethnography (ICQE 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1112))

Included in the following conference series:

Abstract

Using automated classifiers to code discourse data enables researchers to carry out analyses on large datasets. This paper presents a detailed example of applying training, validation and test sets frequently utilized in machine learning to develop automated classifiers for use in quantitative ethnography research. The method was applied to two dispositional constructs. Within one cycle of the process, reliable and valid automated classifiers were developed for Social Disposition. However, the automated coding scheme for Inclusive Disposition was rejected during the validation stage due to issues of overfitting. Nonetheless, the results demonstrate the beneficial potential of using preclassified datasets in enhancing the efficiency and effectiveness of the automation process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)

    MATH  Google Scholar 

  2. Dönmez, P., Rosé, C., Stegmann, K., Weinberger, A., Fischer, F.: Supporting CSCL with automatic corpus analysis technology. In: Proceedings of the 2005 Conference on Computer Support for Collaborative Learning (CSCL), pp. 125–134. International Society of the Learning Sciences (2005)

    Google Scholar 

  3. Eagan, B.R., Hamilton, E.: Epistemic network analysis of an international digital makerspace in Africa, Europe, and the US. Paper presented at the annual meeting of the American Education Research Association (AERA), New York (2018)

    Google Scholar 

  4. Eagan, B.R., Rogers, B., Pozen, R., Marquart, C., Shaffer, D.W.: rhoR: rho for inter rater reliability (version 1.2.1.0) (2019)

    Google Scholar 

  5. Eagan, B.R., Rogers, B., Serlin, R., Ruis, A.R., Arastoopour Irgens, G., Shaffer, D.W.: Can we rely on IRR? Testing the assumptions of inter-rater reliability. In: Proceedings of the 12th International Conference on Computer Supported Collaborative Learning, Philadelphia (2017)

    Google Scholar 

  6. Espino, D.P., Lee, S.B., Eagan, B.R., Hamilton, E.R.: An initial look at the developing culture of online global meet-ups in establishing a collaborative, STEM media-making community. In: Proceedings of the 13th International Conference on Computer-Supported Collaborative Learning (CSCL), pp. 608–611. International Society of the Learning Sciences (2019)

    Google Scholar 

  7. Frederiksen, J.R., Sipusic, M., Sherin, M., Wolfe, E.W.: Video portfolio assessment: creating a framework for viewing the functions of teaching. Educ. Assess. 5(4), 225–297 (1998)

    Article  Google Scholar 

  8. Haykin, S.S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, New York (2009)

    Google Scholar 

  9. Herrenkohl, L.R., Cornelius, L.: Investigating elementary students’ scientific and historical argumentation. J. Learn. Sci. 22(3), 413–461 (2013)

    Article  Google Scholar 

  10. Katz, L.G., McClellan, D.E.: Research into practice series, vol. 8. Fostering children’s social competence: the teacher’s role. National Association for the Education of Young Children, Washington, D.C. (1997)

    Google Scholar 

  11. Lee, S.B., Espino, D.P., Hamilton, E.R.: Exploratory research application of epistemic network analysis for examining international virtual collaborative STEM learning. Paper presented at the annual meeting of the American Educational Research Association (AERA), Toronto (2019)

    Google Scholar 

  12. Lever, J., Krzywinski, M., Altman, N.: Points of significance: model selection and overfitting. Nat. Methods 13(9), 703–704 (2016)

    Article  Google Scholar 

  13. Marquart, C., Swiecki, Z., Eagan, B.R., Shaffer, D.W.: ncodeR: techniques for automated classifiers (version 0.1.2) (2018)

    Google Scholar 

  14. Marquart, C., Hinojosa, C., Swiecki, Z., Eagan, B., Shaffer, D.W.: Epistemic network analysis (version 1.5.2) (2018)

    Google Scholar 

  15. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  16. Shaffer, D.W.: Quantitative Ethnography. Cathcart Press, Madison (2017)

    Google Scholar 

  17. Shaffer, D.W.: Big data for thick description of deep learning. In: Millis, K., Long, D., Magliano, J., Wiemer, K. (eds.) Deep Comprehension, pp. 265–277. Routledge, New York (2018)

    Google Scholar 

  18. Shaffer, D.W., Ruis, A.R.: Epistemic network analysis: a worked example of theory-based learning analytics. In: Lang, C., Siemens, G., Wise, A.F., Gasevic, D. (eds.) Handbook of Learning Analytics, pp. 175–187. Society for Learning Analytics Research (2017)

    Google Scholar 

  19. Swiecki, Z., Ruis, A.R., Farrell, C., Shaffer, D.W.: Assessing individual contributions to collaborative problem solving: a network analysis approach. Comput. Hum. Behav. (2019, in press)

    Google Scholar 

  20. Wise, A.F., Shaffer, D.W.: Why theory matters more than ever in the age of big data. J. Learn. Anal. 2(2), 5–13 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge funding support from the US National Science Foundation for the work this paper reports. Views appearing in this paper do not reflect those of the funding agency.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seung B. Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lee, S.B., Gui, X., Manquen, M., Hamilton, E.R. (2019). Use of Training, Validation, and Test Sets for Developing Automated Classifiers in Quantitative Ethnography. In: Eagan, B., Misfeldt, M., Siebert-Evenstone, A. (eds) Advances in Quantitative Ethnography. ICQE 2019. Communications in Computer and Information Science, vol 1112. Springer, Cham. https://doi.org/10.1007/978-3-030-33232-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33232-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33231-0

  • Online ISBN: 978-3-030-33232-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics