skip to main content
10.1145/3303772.3303801acmotherconferencesArticle/Chapter ViewAbstractPublication PageslakConference Proceedingsconference-collections
research-article

Language as Thought: Using Natural Language Processing to Model Noncognitive Traits that Predict College Success

Published:04 March 2019Publication History

ABSTRACT

It is widely acknowledged that the language we use reflects numerous psychological constructs, including our thoughts, feelings, and desires. Can the so called "noncognitive" traits with known links to success, such as growth mindset, leadership ability, and intrinsic motivation, be similarly revealed through language? We investigated this question by analyzing students' 150-word open-ended descriptions of their own extracurricular activities or work experiences included in their college applications. We used the Common Application-National Student Clearinghouse data set, a six-year longitudinal dataset that includes college application data and graduation outcomes for 278,201 U.S. high-school students. We first developed a coding scheme from a stratified sample of 4,000 essays and used it to code seven traits: growth mindset, perseverance, goal orientation, leadership, psychological connection (intrinsic motivation), self-transcendent (prosocial) purpose, and team orientation, along with earned accolades. Then, we used standard classifiers with bag-of-n-grams as features and deep learning techniques (recurrent neural networks) with word embeddings to automate the coding. The models demonstrated convergent validity with the human coding with AUCs ranging from .770 to .925 and correlations ranging from .418 to .734. There was also evidence of discriminant validity in the pattern of inter-correlations (rs between -.206 to .306) for both human- and model-coded traits. Finally, the models demonstrated incremental predictive validity in predicting six-year graduation outcomes net of sociodemographics, intelligence, academic achievement, and institutional graduation rates. We conclude that language provides a lens into noncognitive traits important for college success, which can be captured with automated methods.

References

  1. Abikoff, H. et al. 1993. Teachers' ratings of disruptive behaviors: The influence of halo effects. Journal of Abnormal Child Psychology. 21(5), 519--533.Google ScholarGoogle ScholarCross RefCross Ref
  2. Bird, S. et al. 2009. Natural Language Processing with Python. O'Reilly Media Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bowen, W.G. et al. 2009. Crossing the finish line: completing college at america's public universities. Princeton University Press.Google ScholarGoogle Scholar
  4. CCA. (2014). Four-year myth. Indianapolis, IN: Complete College America.Google ScholarGoogle Scholar
  5. Chollet, F. 2015. keras, GitHub. https://github.com/fchollet/kerasGoogle ScholarGoogle Scholar
  6. Crossley, S.A. et al. 2017. Sentiment Analysis and Social Cognition Engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order analysis. Behavior Research Methods. 49(3), 803--821.Google ScholarGoogle ScholarCross RefCross Ref
  7. Champa, H. and Anandakumar, K. 2010. Artificial neural network for human behavior prediction through handwriting analysis. International Journal of Computer Applications. 2, 2 (2010), 975--8887.Google ScholarGoogle ScholarCross RefCross Ref
  8. D'Mello, S. et al. 2017. A Big Biodata Approach to Mindsets, Learning Environments, and College Success - May 2017 update. University of Notre Dame.Google ScholarGoogle Scholar
  9. Duckworth, AL. and Allred, KM. 2012. Temperament in the classroom. Handbook of temperament. Guilford Press New York, NY. 627--644.Google ScholarGoogle Scholar
  10. Duckworth, A. L., and Kern, M. L. 2011. A Meta-Analysis of the Convergent Validity of Self-Control Measures. In Journal of Research in Personality, 45, 259--268.Google ScholarGoogle ScholarCross RefCross Ref
  11. Duckworth, AL. and Yeager, D.S. 2015. Measurement matters: assessing personal qualities other than cognitive ability for educational purposes. Educational Researcher. 44, 4 (2015), 237--251.Google ScholarGoogle ScholarCross RefCross Ref
  12. Eccles, J. S. et al. 2003. Extracurricular Activities and Adolescent Development. In Journal of Social Issues, 59(4), 865--889.Google ScholarGoogle ScholarCross RefCross Ref
  13. Fodor, J.A. 2010. Lot 2: the language of thought revisited.Google ScholarGoogle Scholar
  14. Frey, M. C., & Detterman, D. K. (2004). Scholastic assessment or g? The relationship between the scholastic assessment test and general cognitive ability. Psychological Science, 15(6), 373--378.Google ScholarGoogle ScholarCross RefCross Ref
  15. Galla, B. et al. 2014. The Academic Diligence Task (ADT): Assessing Individual Differences in Effort on Tedious but Important Schoolwork. Contemporary Educational Psychology. 39(4), 314--325.Google ScholarGoogle ScholarCross RefCross Ref
  16. Geiser, S., and Studley, R. 2002. UC and the SAT: Predictive Validity and Differential Impact of the SAT I and SAT II at the University of California. Educational Assessment. 8(1), 1--26.Google ScholarGoogle ScholarCross RefCross Ref
  17. Goodwin, B. and Hein, H. 2016. Research says/the x factor in college success. Educational Leadership. 73, 6 (2016), 77--78.Google ScholarGoogle Scholar
  18. Graesser, A. C. et al. 2011. Coh-Metrix Providing Multilevel Analyses of Text Characteristics. Educational Researcher. 40(5), 223--234.Google ScholarGoogle ScholarCross RefCross Ref
  19. Heine S. J. et al. 2002. What's wrong with cross-cultural comparisons of subjective likert scales? The reference-group effect. Journal of Personality and Social Psychology. 82(6), 903--918.Google ScholarGoogle ScholarCross RefCross Ref
  20. Hutt, Stephen et al20. 2018. Prospectively predicting 4-year college graduation from student applications. In Proceedings of the 8th International Conference on Learning Analytics and Knowledge (LAK '18). ACM, New York, NY, USA, 280--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jones E. and Sigall H. 1971. The bogus pipeline: A new paradigm for measuring affect and attitude. Psychological Bulletin. 76(5), 349--364.Google ScholarGoogle ScholarCross RefCross Ref
  22. Karamouzis, S.T. and Vrettos, A. 2008. An artificial neural network for predicting student graduation outcomes. Proceedings of the World Congress on Engineering and Computer Science. (2008), 22--25.Google ScholarGoogle Scholar
  23. Khan, J. et al. 2001. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine. 7, 6 (2001), 673--679.Google ScholarGoogle ScholarCross RefCross Ref
  24. Lemaître, G. et al. 2017. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res.. 18, 1 (January 2017), 559--563. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Larson, R.W. et al. 2006. Differing profiles of developmental experiences across types of organized youth activities. Developmental psychology. 42, 5 (2006), 849-863.Google ScholarGoogle Scholar
  26. Liu, F. et al. 2017. A Language-independent and Compositional Model for Personality Trait Recognition from Short Texts. EACL.Google ScholarGoogle Scholar
  27. Ma, A.V. 2017. Neural Networks in Predicting Myers Brigg Personality Type From Writing Style.Google ScholarGoogle Scholar
  28. Meindl, P. (in press). No Pain, No Gain: A Brief Behavioral Measure of Frustration Tolerance Predicts Academic Achievement Two Years Later. Emotion.Google ScholarGoogle Scholar
  29. Messick S. 1979. Potential uses of noncognitive measurement in education. Journal of Educational Psychology. 71(3), 281.Google ScholarGoogle ScholarCross RefCross Ref
  30. Mikolov, T. et al. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13), C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), Vol. 2. Curran Associates Inc., USA, 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Novarese, M., and Di Giovinazzo, V. 2013. Promptness and academic performance. Munich Personal RePEc Archive.Google ScholarGoogle Scholar
  32. Pedregosa, F. et al. 2011. Scikit-learn: machine learning in python. Journal of Machine Learning Research. 12, 2825--2830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Pennebaker, J.W. et al. 2003. Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology. 54, 547--577.Google ScholarGoogle ScholarCross RefCross Ref
  34. Pennebaker, J.W. and Tausczik, Y.R. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology. 29(1), 24--54.Google ScholarGoogle ScholarCross RefCross Ref
  35. Pennington, J. et al. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 14. 1532-1543.Google ScholarGoogle Scholar
  36. Porter, M. 2001. Snowball: a language for stemming algorithms. Snowball.Google ScholarGoogle Scholar
  37. Schafer, J. L. 1999. Multiple imputation: a primer. Statistical methods in medical research, 8(1), 3--15.Google ScholarGoogle Scholar
  38. Schwartz, H.A. et al. 2013. Personality, gender, and age in the language of social media: The open-vocabulary approach. Plos One. 8(9), e73791.Google ScholarGoogle ScholarCross RefCross Ref
  39. Schuster, M. and Paliwal, K. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing. 45(11), 2673--2681. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Snyder, T.D. et al. 2016. Digest of education statistics, 2015. National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education.Google ScholarGoogle Scholar
  41. Soto C. J., et al. 2008. The developmental psychometrics of big five self-reports: Acquiescence, factor structure, coherence, and differentiation from ages 10 to 20. Journal of Personality and Social Psychology. 94(4), 718.Google ScholarGoogle ScholarCross RefCross Ref
  42. Steyvers, M. and Griffiths, T. 2007. Probabilistic Topic Models. In Handbook of latent semantic analysis, T. K. Landauer et al., Eds. Lawrence Erlbaum Associates, Inc., Mahwah, NJ, 424--440.Google ScholarGoogle Scholar
  43. Strauss, A., and Corbin, J. M. 1990. Basics of qualitative research: Grounded theory procedures and techniques: Sage Publications, Inc.Google ScholarGoogle Scholar
  44. Tan, C. et al. 2002. The use of bigrams to enhance text categorization. Inf. Process. Manage. 38, 4 (July 2002), 529-546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Tsz-Wai, R. et al. 2005. Automatically building a stopword list for an information retrieval system. In Journal on Digital Information Management: Special Issue on the 5th Dutch-Belgian Information Retrieval Workshop (DIR). 5, 17--24.Google ScholarGoogle Scholar
  46. Warren, J. R. 2002. Reconsidering the relationship between student employment and academic outcomes: A new theory and better data. Youth & Society. 33(3), 366--393.Google ScholarGoogle ScholarCross RefCross Ref
  47. Yang, Z. et al.. 2016. Hierarchical Attention Networks for Document Classification. HLT-NAACL.Google ScholarGoogle Scholar
  48. Young, T. et al. 2018. Recent Trends in Deep Learning Based Natural Language Processing {Review Article}. In IEEE Computational Intelligence Magazine. 13(3), 55--75.Google ScholarGoogle ScholarCross RefCross Ref
  49. Zaff, J. F., et al. 2003. Implications of extracurricular activity participation during adolescence on positive outcomes. Journal of Adolescent Research. 18(6), 599--630.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Language as Thought: Using Natural Language Processing to Model Noncognitive Traits that Predict College Success

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          LAK19: Proceedings of the 9th International Conference on Learning Analytics & Knowledge
          March 2019
          565 pages
          ISBN:9781450362566
          DOI:10.1145/3303772

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 March 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate236of782submissions,30%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader