Use of Training, Validation, and Test Sets for Developing Automated Classifiers in Quantitative Ethnography

Lee, Seung B.; Gui, Xiaofan; Manquen, Megan; Hamilton, Eric R.

doi:10.1007/978-3-030-33232-7_10

Seung B. Lee⁹,
Xiaofan Gui⁹,
Megan Manquen⁹ &
…
Eric R. Hamilton⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1112))

Included in the following conference series:

International Conference on Quantitative Ethnography

1592 Accesses
4 Citations

Abstract

Using automated classifiers to code discourse data enables researchers to carry out analyses on large datasets. This paper presents a detailed example of applying training, validation and test sets frequently utilized in machine learning to develop automated classifiers for use in quantitative ethnography research. The method was applied to two dispositional constructs. Within one cycle of the process, reliable and valid automated classifiers were developed for Social Disposition. However, the automated coding scheme for Inclusive Disposition was rejected during the validation stage due to issues of overfitting. Nonetheless, the results demonstrate the beneficial potential of using preclassified datasets in enhancing the efficiency and effectiveness of the automation process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
MATH Google Scholar
Dönmez, P., Rosé, C., Stegmann, K., Weinberger, A., Fischer, F.: Supporting CSCL with automatic corpus analysis technology. In: Proceedings of the 2005 Conference on Computer Support for Collaborative Learning (CSCL), pp. 125–134. International Society of the Learning Sciences (2005)
Google Scholar
Eagan, B.R., Hamilton, E.: Epistemic network analysis of an international digital makerspace in Africa, Europe, and the US. Paper presented at the annual meeting of the American Education Research Association (AERA), New York (2018)
Google Scholar
Eagan, B.R., Rogers, B., Pozen, R., Marquart, C., Shaffer, D.W.: rhoR: rho for inter rater reliability (version 1.2.1.0) (2019)
Google Scholar
Eagan, B.R., Rogers, B., Serlin, R., Ruis, A.R., Arastoopour Irgens, G., Shaffer, D.W.: Can we rely on IRR? Testing the assumptions of inter-rater reliability. In: Proceedings of the 12th International Conference on Computer Supported Collaborative Learning, Philadelphia (2017)
Google Scholar
Espino, D.P., Lee, S.B., Eagan, B.R., Hamilton, E.R.: An initial look at the developing culture of online global meet-ups in establishing a collaborative, STEM media-making community. In: Proceedings of the 13th International Conference on Computer-Supported Collaborative Learning (CSCL), pp. 608–611. International Society of the Learning Sciences (2019)
Google Scholar
Frederiksen, J.R., Sipusic, M., Sherin, M., Wolfe, E.W.: Video portfolio assessment: creating a framework for viewing the functions of teaching. Educ. Assess. 5(4), 225–297 (1998)
Article Google Scholar
Haykin, S.S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, New York (2009)
Google Scholar
Herrenkohl, L.R., Cornelius, L.: Investigating elementary students’ scientific and historical argumentation. J. Learn. Sci. 22(3), 413–461 (2013)
Article Google Scholar
Katz, L.G., McClellan, D.E.: Research into practice series, vol. 8. Fostering children’s social competence: the teacher’s role. National Association for the Education of Young Children, Washington, D.C. (1997)
Google Scholar
Lee, S.B., Espino, D.P., Hamilton, E.R.: Exploratory research application of epistemic network analysis for examining international virtual collaborative STEM learning. Paper presented at the annual meeting of the American Educational Research Association (AERA), Toronto (2019)
Google Scholar
Lever, J., Krzywinski, M., Altman, N.: Points of significance: model selection and overfitting. Nat. Methods 13(9), 703–704 (2016)
Article Google Scholar
Marquart, C., Swiecki, Z., Eagan, B.R., Shaffer, D.W.: ncodeR: techniques for automated classifiers (version 0.1.2) (2018)
Google Scholar
Marquart, C., Hinojosa, C., Swiecki, Z., Eagan, B., Shaffer, D.W.: Epistemic network analysis (version 1.5.2) (2018)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article MathSciNet Google Scholar
Shaffer, D.W.: Quantitative Ethnography. Cathcart Press, Madison (2017)
Google Scholar
Shaffer, D.W.: Big data for thick description of deep learning. In: Millis, K., Long, D., Magliano, J., Wiemer, K. (eds.) Deep Comprehension, pp. 265–277. Routledge, New York (2018)
Google Scholar
Shaffer, D.W., Ruis, A.R.: Epistemic network analysis: a worked example of theory-based learning analytics. In: Lang, C., Siemens, G., Wise, A.F., Gasevic, D. (eds.) Handbook of Learning Analytics, pp. 175–187. Society for Learning Analytics Research (2017)
Google Scholar
Swiecki, Z., Ruis, A.R., Farrell, C., Shaffer, D.W.: Assessing individual contributions to collaborative problem solving: a network analysis approach. Comput. Hum. Behav. (2019, in press)
Google Scholar
Wise, A.F., Shaffer, D.W.: Why theory matters more than ever in the age of big data. J. Learn. Anal. 2(2), 5–13 (2016)
Article Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge funding support from the US National Science Foundation for the work this paper reports. Views appearing in this paper do not reflect those of the funding agency.

Author information

Authors and Affiliations

Pepperdine University, Malibu, CA, 90263, USA
Seung B. Lee, Xiaofan Gui, Megan Manquen & Eric R. Hamilton

Authors

Seung B. Lee
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofan Gui
View author publications
You can also search for this author in PubMed Google Scholar
Megan Manquen
View author publications
You can also search for this author in PubMed Google Scholar
Eric R. Hamilton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seung B. Lee .

Editor information

Editors and Affiliations

University of Wisconsin–Madison, Madison, WI, USA
Brendan Eagan
University of Copenhagen, Copenhagen, Denmark
Morten Misfeldt
University of Wisconsin–Madison, Madison, WI, USA
Amanda Siebert-Evenstone

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, S.B., Gui, X., Manquen, M., Hamilton, E.R. (2019). Use of Training, Validation, and Test Sets for Developing Automated Classifiers in Quantitative Ethnography. In: Eagan, B., Misfeldt, M., Siebert-Evenstone, A. (eds) Advances in Quantitative Ethnography. ICQE 2019. Communications in Computer and Information Science, vol 1112. Springer, Cham. https://doi.org/10.1007/978-3-030-33232-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-33232-7_10
Published: 11 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33231-0
Online ISBN: 978-3-030-33232-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics