Abstract
Data-mined models often achieve good predictive power, but sometimes at the cost of interpretability. We investigate here if selecting features to increase a model’s construct validity and interpretability also can improve the model’s ability to predict the desired constructs. We do this by taking existing models and reducing the feature set to increase construct validity. We then compare the existing and new models on their predictive capabilities within a held-out test set in two ways. First, we analyze the models’ overall predictive performance. Second, we determine how much student interaction data is necessary to make accurate predictions. We find that these reduced models with higher construct validity not only achieve better agreement overall, but also achieve better prediction with less data. This work is conducted in the context of developing models to assess students’ inquiry skill at designing controlled experiments and testing stated hypotheses within a science inquiry microworld.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Yu, L., Liu, H.: Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In: Proc. of the 20th Int’l Conf. on Machine Learning, pp. 856–863 (2003)
Pudil, P., Novovicova, J., Kittler, J.: Floating Search Methods in Feature Selection. Pattern Recognition Letters 15(11), 1119–1125 (1994)
Oh, I.-S., Lee, J.-S., Moon, B.-R.: Hybrid Genetic Algorithms for Feature Selection. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(11), 1424–1437 (2004)
Bernardini, A., Conati, C.: Discovering and Recognizing Student Interaction Patterns in Exploratory Learning Environments. In: Aleven, V., Kay, J., Mostow, J. (eds.) ITS 2010. LNCS, vol. 6094, pp. 125–134. Springer, Heidelberg (2010)
Sao Pedro, M.A., de Baker, R.S.J., Gobert, J.D., Montalvo, O., Nakama, A.: Leveraging Machine-Learned Detectors of Systematic Inquiry Behavior to Estimate and Predict Transfer of Inquiry Skill. User Modeling and User-Adapted Interaction (in press)
Chen, Z., Klahr, D.: All Other Things Being Equal: Acquisition and Transfer of the Control of Variables Strategy. Child Development 70(5), 1098–1120 (1999)
McElhaney, K., Linn, M.: Helping Students Make Controlled Experiments More Informative. In: Proc. of the 9th Int’l Conf. of the Learning Sciences, pp. 786–793 (2010)
Buckley, B.C., Gobert, J., Horwitz, P.: Using Log Files to Track Students’ Model-Based Inquiry. In: Proc. of the 7th Int’l Conf. of the Learning Sciences, pp. 57–63 (2006)
Gobert, J., Sao Pedro, M., Baker, R., Toto, E., Montalvo, O.: Leveraging Educational Data Mining for Real Time Performance Assessment of Scientific Inquiry Skills within Microworlds. Journal of Educational Data Mining (accepted)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Hanley, J.A., McNeil, B.J.: The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve. Radiology 143, 29–36 (1982)
Ben-David, A.: About the Relationship between ROC Curves and Cohen’s Kappa. Engineering Applications of Artificial Intelligence 21, 874–882 (2008)
Fogarty, J., Baker, R., Hudson, S.: Case Studies in the Use of ROC Curve Analysis for Sensor-Based Estimates in Human Computer Interaction. In: Proc. of Graphics Interface, pp. 129–136 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sao Pedro, M.A., Baker, R.S.J.d., Gobert, J.D. (2012). Improving Construct Validity Yields Better Models of Systematic Inquiry, Even with Less Information. In: Masthoff, J., Mobasher, B., Desmarais, M.C., Nkambou, R. (eds) User Modeling, Adaptation, and Personalization. UMAP 2012. Lecture Notes in Computer Science, vol 7379. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31454-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-31454-4_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31453-7
Online ISBN: 978-3-642-31454-4
eBook Packages: Computer ScienceComputer Science (R0)