ABSTRACT
It is widely acknowledged that the language we use reflects numerous psychological constructs, including our thoughts, feelings, and desires. Can the so called "noncognitive" traits with known links to success, such as growth mindset, leadership ability, and intrinsic motivation, be similarly revealed through language? We investigated this question by analyzing students' 150-word open-ended descriptions of their own extracurricular activities or work experiences included in their college applications. We used the Common Application-National Student Clearinghouse data set, a six-year longitudinal dataset that includes college application data and graduation outcomes for 278,201 U.S. high-school students. We first developed a coding scheme from a stratified sample of 4,000 essays and used it to code seven traits: growth mindset, perseverance, goal orientation, leadership, psychological connection (intrinsic motivation), self-transcendent (prosocial) purpose, and team orientation, along with earned accolades. Then, we used standard classifiers with bag-of-n-grams as features and deep learning techniques (recurrent neural networks) with word embeddings to automate the coding. The models demonstrated convergent validity with the human coding with AUCs ranging from .770 to .925 and correlations ranging from .418 to .734. There was also evidence of discriminant validity in the pattern of inter-correlations (rs between -.206 to .306) for both human- and model-coded traits. Finally, the models demonstrated incremental predictive validity in predicting six-year graduation outcomes net of sociodemographics, intelligence, academic achievement, and institutional graduation rates. We conclude that language provides a lens into noncognitive traits important for college success, which can be captured with automated methods.
- Abikoff, H. et al. 1993. Teachers' ratings of disruptive behaviors: The influence of halo effects. Journal of Abnormal Child Psychology. 21(5), 519--533.Google ScholarCross Ref
- Bird, S. et al. 2009. Natural Language Processing with Python. O'Reilly Media Inc. Google ScholarDigital Library
- Bowen, W.G. et al. 2009. Crossing the finish line: completing college at america's public universities. Princeton University Press.Google Scholar
- CCA. (2014). Four-year myth. Indianapolis, IN: Complete College America.Google Scholar
- Chollet, F. 2015. keras, GitHub. https://github.com/fchollet/kerasGoogle Scholar
- Crossley, S.A. et al. 2017. Sentiment Analysis and Social Cognition Engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order analysis. Behavior Research Methods. 49(3), 803--821.Google ScholarCross Ref
- Champa, H. and Anandakumar, K. 2010. Artificial neural network for human behavior prediction through handwriting analysis. International Journal of Computer Applications. 2, 2 (2010), 975--8887.Google ScholarCross Ref
- D'Mello, S. et al. 2017. A Big Biodata Approach to Mindsets, Learning Environments, and College Success - May 2017 update. University of Notre Dame.Google Scholar
- Duckworth, AL. and Allred, KM. 2012. Temperament in the classroom. Handbook of temperament. Guilford Press New York, NY. 627--644.Google Scholar
- Duckworth, A. L., and Kern, M. L. 2011. A Meta-Analysis of the Convergent Validity of Self-Control Measures. In Journal of Research in Personality, 45, 259--268.Google ScholarCross Ref
- Duckworth, AL. and Yeager, D.S. 2015. Measurement matters: assessing personal qualities other than cognitive ability for educational purposes. Educational Researcher. 44, 4 (2015), 237--251.Google ScholarCross Ref
- Eccles, J. S. et al. 2003. Extracurricular Activities and Adolescent Development. In Journal of Social Issues, 59(4), 865--889.Google ScholarCross Ref
- Fodor, J.A. 2010. Lot 2: the language of thought revisited.Google Scholar
- Frey, M. C., & Detterman, D. K. (2004). Scholastic assessment or g? The relationship between the scholastic assessment test and general cognitive ability. Psychological Science, 15(6), 373--378.Google ScholarCross Ref
- Galla, B. et al. 2014. The Academic Diligence Task (ADT): Assessing Individual Differences in Effort on Tedious but Important Schoolwork. Contemporary Educational Psychology. 39(4), 314--325.Google ScholarCross Ref
- Geiser, S., and Studley, R. 2002. UC and the SAT: Predictive Validity and Differential Impact of the SAT I and SAT II at the University of California. Educational Assessment. 8(1), 1--26.Google ScholarCross Ref
- Goodwin, B. and Hein, H. 2016. Research says/the x factor in college success. Educational Leadership. 73, 6 (2016), 77--78.Google Scholar
- Graesser, A. C. et al. 2011. Coh-Metrix Providing Multilevel Analyses of Text Characteristics. Educational Researcher. 40(5), 223--234.Google ScholarCross Ref
- Heine S. J. et al. 2002. What's wrong with cross-cultural comparisons of subjective likert scales? The reference-group effect. Journal of Personality and Social Psychology. 82(6), 903--918.Google ScholarCross Ref
- Hutt, Stephen et al20. 2018. Prospectively predicting 4-year college graduation from student applications. In Proceedings of the 8th International Conference on Learning Analytics and Knowledge (LAK '18). ACM, New York, NY, USA, 280--289. Google ScholarDigital Library
- Jones E. and Sigall H. 1971. The bogus pipeline: A new paradigm for measuring affect and attitude. Psychological Bulletin. 76(5), 349--364.Google ScholarCross Ref
- Karamouzis, S.T. and Vrettos, A. 2008. An artificial neural network for predicting student graduation outcomes. Proceedings of the World Congress on Engineering and Computer Science. (2008), 22--25.Google Scholar
- Khan, J. et al. 2001. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine. 7, 6 (2001), 673--679.Google ScholarCross Ref
- Lemaître, G. et al. 2017. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res.. 18, 1 (January 2017), 559--563. Google ScholarDigital Library
- Larson, R.W. et al. 2006. Differing profiles of developmental experiences across types of organized youth activities. Developmental psychology. 42, 5 (2006), 849-863.Google Scholar
- Liu, F. et al. 2017. A Language-independent and Compositional Model for Personality Trait Recognition from Short Texts. EACL.Google Scholar
- Ma, A.V. 2017. Neural Networks in Predicting Myers Brigg Personality Type From Writing Style.Google Scholar
- Meindl, P. (in press). No Pain, No Gain: A Brief Behavioral Measure of Frustration Tolerance Predicts Academic Achievement Two Years Later. Emotion.Google Scholar
- Messick S. 1979. Potential uses of noncognitive measurement in education. Journal of Educational Psychology. 71(3), 281.Google ScholarCross Ref
- Mikolov, T. et al. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13), C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), Vol. 2. Curran Associates Inc., USA, 3111--3119. Google ScholarDigital Library
- Novarese, M., and Di Giovinazzo, V. 2013. Promptness and academic performance. Munich Personal RePEc Archive.Google Scholar
- Pedregosa, F. et al. 2011. Scikit-learn: machine learning in python. Journal of Machine Learning Research. 12, 2825--2830. Google ScholarDigital Library
- Pennebaker, J.W. et al. 2003. Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology. 54, 547--577.Google ScholarCross Ref
- Pennebaker, J.W. and Tausczik, Y.R. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology. 29(1), 24--54.Google ScholarCross Ref
- Pennington, J. et al. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 14. 1532-1543.Google Scholar
- Porter, M. 2001. Snowball: a language for stemming algorithms. Snowball.Google Scholar
- Schafer, J. L. 1999. Multiple imputation: a primer. Statistical methods in medical research, 8(1), 3--15.Google Scholar
- Schwartz, H.A. et al. 2013. Personality, gender, and age in the language of social media: The open-vocabulary approach. Plos One. 8(9), e73791.Google ScholarCross Ref
- Schuster, M. and Paliwal, K. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing. 45(11), 2673--2681. Google ScholarDigital Library
- Snyder, T.D. et al. 2016. Digest of education statistics, 2015. National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education.Google Scholar
- Soto C. J., et al. 2008. The developmental psychometrics of big five self-reports: Acquiescence, factor structure, coherence, and differentiation from ages 10 to 20. Journal of Personality and Social Psychology. 94(4), 718.Google ScholarCross Ref
- Steyvers, M. and Griffiths, T. 2007. Probabilistic Topic Models. In Handbook of latent semantic analysis, T. K. Landauer et al., Eds. Lawrence Erlbaum Associates, Inc., Mahwah, NJ, 424--440.Google Scholar
- Strauss, A., and Corbin, J. M. 1990. Basics of qualitative research: Grounded theory procedures and techniques: Sage Publications, Inc.Google Scholar
- Tan, C. et al. 2002. The use of bigrams to enhance text categorization. Inf. Process. Manage. 38, 4 (July 2002), 529-546. Google ScholarDigital Library
- Tsz-Wai, R. et al. 2005. Automatically building a stopword list for an information retrieval system. In Journal on Digital Information Management: Special Issue on the 5th Dutch-Belgian Information Retrieval Workshop (DIR). 5, 17--24.Google Scholar
- Warren, J. R. 2002. Reconsidering the relationship between student employment and academic outcomes: A new theory and better data. Youth & Society. 33(3), 366--393.Google ScholarCross Ref
- Yang, Z. et al.. 2016. Hierarchical Attention Networks for Document Classification. HLT-NAACL.Google Scholar
- Young, T. et al. 2018. Recent Trends in Deep Learning Based Natural Language Processing {Review Article}. In IEEE Computational Intelligence Magazine. 13(3), 55--75.Google ScholarCross Ref
- Zaff, J. F., et al. 2003. Implications of extracurricular activity participation during adolescence on positive outcomes. Journal of Adolescent Research. 18(6), 599--630.Google ScholarCross Ref
Index Terms
- Language as Thought: Using Natural Language Processing to Model Noncognitive Traits that Predict College Success
Recommendations
Prospectively predicting 4-year college graduation from student applications
LAK '18: Proceedings of the 8th International Conference on Learning Analytics and KnowledgeWe leverage a unique national dataset of 41,359 college applications to prospectively predict 4-year bachelor's graduation in a generalizable manner. Our features include sociodemographics, institutional graduation rates, academic achievement, ...
Student Performance in Mathematics: Should we be Concerned?: Evidence from a Retail Course
This article describes how for many college students the transition to college-level mathematics courses presents new challenges beyond those that were part of the high school experience. In this interdisciplinary study forty-four non-mathematics and ...
Students with Learning Disabilities' Perceptions of Self-Determining Factors Contributing to College Success
This qualitative study identified the factors that contributed to the success experienced by students with learning disabilities in their first year of college. The primary factors that emerged from student interviews were their attitudes about higher ...
Comments