research-article

Language as Thought: Using Natural Language Processing to Model Noncognitive Traits that Predict College Success

Authors:
Cathlyn Stone

University of Colorado Boulder, Boulder, CO, USA

University of Colorado Boulder, Boulder, CO, USA
View Profile

,
Abigail Quirk

Character Lab, Philadelphia, PA, USA

Character Lab, Philadelphia, PA, USA
View Profile

,
Margo Gardener

Teachers College, Columbia University, New York, NY, USA

Teachers College, Columbia University, New York, NY, USA
View Profile

,
Stephen Hutt

University of Colorado Boulder, Boulder, CO, USA

University of Colorado Boulder, Boulder, CO, USA
View Profile

,
Angela L. Duckworth

University of Pennsylvania and Character Lab, Philadelphia, PA, USA

University of Pennsylvania and Character Lab, Philadelphia, PA, USA
View Profile

,
Sidney K. D'Mello

University of Colorado Boulder, Boulder, CO, USA

University of Colorado Boulder, Boulder, CO, USA
View Profile

LAK19: Proceedings of the 9th International Conference on Learning Analytics & KnowledgeMarch 2019Pages 320–329https://doi.org/10.1145/3303772.3303801

Published:04 March 2019Publication History

LAK19: Proceedings of the 9th International Conference on Learning Analytics & Knowledge

Pages 320–329

ABSTRACT

It is widely acknowledged that the language we use reflects numerous psychological constructs, including our thoughts, feelings, and desires. Can the so called "noncognitive" traits with known links to success, such as growth mindset, leadership ability, and intrinsic motivation, be similarly revealed through language? We investigated this question by analyzing students' 150-word open-ended descriptions of their own extracurricular activities or work experiences included in their college applications. We used the Common Application-National Student Clearinghouse data set, a six-year longitudinal dataset that includes college application data and graduation outcomes for 278,201 U.S. high-school students. We first developed a coding scheme from a stratified sample of 4,000 essays and used it to code seven traits: growth mindset, perseverance, goal orientation, leadership, psychological connection (intrinsic motivation), self-transcendent (prosocial) purpose, and team orientation, along with earned accolades. Then, we used standard classifiers with bag-of-n-grams as features and deep learning techniques (recurrent neural networks) with word embeddings to automate the coding. The models demonstrated convergent validity with the human coding with AUCs ranging from .770 to .925 and correlations ranging from .418 to .734. There was also evidence of discriminant validity in the pattern of inter-correlations (rs between -.206 to .306) for both human- and model-coded traits. Finally, the models demonstrated incremental predictive validity in predicting six-year graduation outcomes net of sociodemographics, intelligence, academic achievement, and institutional graduation rates. We conclude that language provides a lens into noncognitive traits important for college success, which can be captured with automated methods.

References

Abikoff, H. et al. 1993. Teachers' ratings of disruptive behaviors: The influence of halo effects. Journal of Abnormal Child Psychology. 21(5), 519--533.Google ScholarCross Ref
Bird, S. et al. 2009. Natural Language Processing with Python. O'Reilly Media Inc. Google ScholarDigital Library
Bowen, W.G. et al. 2009. Crossing the finish line: completing college at america's public universities. Princeton University Press.Google Scholar
CCA. (2014). Four-year myth. Indianapolis, IN: Complete College America.Google Scholar
Chollet, F. 2015. keras, GitHub. https://github.com/fchollet/kerasGoogle Scholar
Crossley, S.A. et al. 2017. Sentiment Analysis and Social Cognition Engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order analysis. Behavior Research Methods. 49(3), 803--821.Google ScholarCross Ref
Champa, H. and Anandakumar, K. 2010. Artificial neural network for human behavior prediction through handwriting analysis. International Journal of Computer Applications. 2, 2 (2010), 975--8887.Google ScholarCross Ref
D'Mello, S. et al. 2017. A Big Biodata Approach to Mindsets, Learning Environments, and College Success - May 2017 update. University of Notre Dame.Google Scholar
Duckworth, AL. and Allred, KM. 2012. Temperament in the classroom. Handbook of temperament. Guilford Press New York, NY. 627--644.Google Scholar
Duckworth, A. L., and Kern, M. L. 2011. A Meta-Analysis of the Convergent Validity of Self-Control Measures. In Journal of Research in Personality, 45, 259--268.Google ScholarCross Ref
Duckworth, AL. and Yeager, D.S. 2015. Measurement matters: assessing personal qualities other than cognitive ability for educational purposes. Educational Researcher. 44, 4 (2015), 237--251.Google ScholarCross Ref
Eccles, J. S. et al. 2003. Extracurricular Activities and Adolescent Development. In Journal of Social Issues, 59(4), 865--889.Google ScholarCross Ref
Fodor, J.A. 2010. Lot 2: the language of thought revisited.Google Scholar
Frey, M. C., & Detterman, D. K. (2004). Scholastic assessment or g? The relationship between the scholastic assessment test and general cognitive ability. Psychological Science, 15(6), 373--378.Google ScholarCross Ref
Galla, B. et al. 2014. The Academic Diligence Task (ADT): Assessing Individual Differences in Effort on Tedious but Important Schoolwork. Contemporary Educational Psychology. 39(4), 314--325.Google ScholarCross Ref
Geiser, S., and Studley, R. 2002. UC and the SAT: Predictive Validity and Differential Impact of the SAT I and SAT II at the University of California. Educational Assessment. 8(1), 1--26.Google ScholarCross Ref
Goodwin, B. and Hein, H. 2016. Research says/the x factor in college success. Educational Leadership. 73, 6 (2016), 77--78.Google Scholar
Graesser, A. C. et al. 2011. Coh-Metrix Providing Multilevel Analyses of Text Characteristics. Educational Researcher. 40(5), 223--234.Google ScholarCross Ref
Heine S. J. et al. 2002. What's wrong with cross-cultural comparisons of subjective likert scales? The reference-group effect. Journal of Personality and Social Psychology. 82(6), 903--918.Google ScholarCross Ref
Hutt, Stephen et al20. 2018. Prospectively predicting 4-year college graduation from student applications. In Proceedings of the 8th International Conference on Learning Analytics and Knowledge (LAK '18). ACM, New York, NY, USA, 280--289. Google ScholarDigital Library
Jones E. and Sigall H. 1971. The bogus pipeline: A new paradigm for measuring affect and attitude. Psychological Bulletin. 76(5), 349--364.Google ScholarCross Ref
Karamouzis, S.T. and Vrettos, A. 2008. An artificial neural network for predicting student graduation outcomes. Proceedings of the World Congress on Engineering and Computer Science. (2008), 22--25.Google Scholar
Khan, J. et al. 2001. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine. 7, 6 (2001), 673--679.Google ScholarCross Ref
Lemaître, G. et al. 2017. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res.. 18, 1 (January 2017), 559--563. Google ScholarDigital Library
Larson, R.W. et al. 2006. Differing profiles of developmental experiences across types of organized youth activities. Developmental psychology. 42, 5 (2006), 849-863.Google Scholar
Liu, F. et al. 2017. A Language-independent and Compositional Model for Personality Trait Recognition from Short Texts. EACL.Google Scholar
Ma, A.V. 2017. Neural Networks in Predicting Myers Brigg Personality Type From Writing Style.Google Scholar
Meindl, P. (in press). No Pain, No Gain: A Brief Behavioral Measure of Frustration Tolerance Predicts Academic Achievement Two Years Later. Emotion.Google Scholar
Messick S. 1979. Potential uses of noncognitive measurement in education. Journal of Educational Psychology. 71(3), 281.Google ScholarCross Ref
Mikolov, T. et al. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13), C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), Vol. 2. Curran Associates Inc., USA, 3111--3119. Google ScholarDigital Library
Novarese, M., and Di Giovinazzo, V. 2013. Promptness and academic performance. Munich Personal RePEc Archive.Google Scholar
Pedregosa, F. et al. 2011. Scikit-learn: machine learning in python. Journal of Machine Learning Research. 12, 2825--2830. Google ScholarDigital Library
Pennebaker, J.W. et al. 2003. Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology. 54, 547--577.Google ScholarCross Ref
Pennebaker, J.W. and Tausczik, Y.R. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology. 29(1), 24--54.Google ScholarCross Ref
Pennington, J. et al. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 14. 1532-1543.Google Scholar
Porter, M. 2001. Snowball: a language for stemming algorithms. Snowball.Google Scholar
Schafer, J. L. 1999. Multiple imputation: a primer. Statistical methods in medical research, 8(1), 3--15.Google Scholar
Schwartz, H.A. et al. 2013. Personality, gender, and age in the language of social media: The open-vocabulary approach. Plos One. 8(9), e73791.Google ScholarCross Ref
Schuster, M. and Paliwal, K. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing. 45(11), 2673--2681. Google ScholarDigital Library
Snyder, T.D. et al. 2016. Digest of education statistics, 2015. National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education.Google Scholar
Soto C. J., et al. 2008. The developmental psychometrics of big five self-reports: Acquiescence, factor structure, coherence, and differentiation from ages 10 to 20. Journal of Personality and Social Psychology. 94(4), 718.Google ScholarCross Ref
Steyvers, M. and Griffiths, T. 2007. Probabilistic Topic Models. In Handbook of latent semantic analysis, T. K. Landauer et al., Eds. Lawrence Erlbaum Associates, Inc., Mahwah, NJ, 424--440.Google Scholar
Strauss, A., and Corbin, J. M. 1990. Basics of qualitative research: Grounded theory procedures and techniques: Sage Publications, Inc.Google Scholar
Tan, C. et al. 2002. The use of bigrams to enhance text categorization. Inf. Process. Manage. 38, 4 (July 2002), 529-546. Google ScholarDigital Library
Tsz-Wai, R. et al. 2005. Automatically building a stopword list for an information retrieval system. In Journal on Digital Information Management: Special Issue on the 5th Dutch-Belgian Information Retrieval Workshop (DIR). 5, 17--24.Google Scholar
Warren, J. R. 2002. Reconsidering the relationship between student employment and academic outcomes: A new theory and better data. Youth & Society. 33(3), 366--393.Google ScholarCross Ref
Yang, Z. et al.. 2016. Hierarchical Attention Networks for Document Classification. HLT-NAACL.Google Scholar
Young, T. et al. 2018. Recent Trends in Deep Learning Based Natural Language Processing {Review Article}. In IEEE Computational Intelligence Magazine. 13(3), 55--75.Google ScholarCross Ref
Zaff, J. F., et al. 2003. Implications of extracurricular activity participation during adolescence on positive outcomes. Journal of Adolescent Research. 18(6), 599--630.Google ScholarCross Ref

Index Terms

Language as Thought: Using Natural Language Processing to Model Noncognitive Traits that Predict College Success
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Information extraction

Recommendations

Prospectively predicting 4-year college graduation from student applications
LAK '18: Proceedings of the 8th International Conference on Learning Analytics and Knowledge

We leverage a unique national dataset of 41,359 college applications to prospectively predict 4-year bachelor's graduation in a generalizable manner. Our features include sociodemographics, institutional graduation rates, academic achievement, ...
Read More
Student Performance in Mathematics: Should we be Concerned?: Evidence from a Retail Course

This article describes how for many college students the transition to college-level mathematics courses presents new challenges beyond those that were part of the high school experience. In this interdisciplinary study forty-four non-mathematics and ...
Read More
Students with Learning Disabilities' Perceptions of Self-Determining Factors Contributing to College Success

This qualitative study identified the factors that contributed to the success experienced by students with learning disabilities in their first year of college. The primary factors that emerged from student interviews were their attitudes about higher ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LAK19: Proceedings of the 9th International Conference on Learning Analytics & Knowledge
March 2019
565 pages
ISBN:9781450362566
DOI:10.1145/3303772
General Chairs:
Sharon Hsiao
Arizona State University, USA
,
Jim Cunningham
Arizona State University, USA
,
Katie McCarthy
Georgia State University, USA
,
Grace Lynch
Society for Learning Analytics Research, Australia
,
Program Chairs:
Christopher Brooks
University of Michigan, USA
,
Rebecca Ferguson
The Open University, UK
,
Ulrich Hoppe
University of Duisburg-Essen, Germany
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 March 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
College Success
Common App
Deep learning
Natural Language Processing
Neural Networks
Noncognitive traits
n-grams
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate236of782submissions,30%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 588
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Language as Thought: Using Natural Language Processing to Model Noncognitive Traits that Predict College Success

LAK19: Proceedings of the 9th International Conference on Learning Analytics & Knowledge

ABSTRACT

References

Cited By

Index Terms

Recommendations

Prospectively predicting 4-year college graduation from student applications

Student Performance in Mathematics: Should we be Concerned?: Evidence from a Retail Course

Students with Learning Disabilities' Perceptions of Self-Determining Factors Contributing to College Success

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Language as Thought: Using Natural Language Processing to Model Noncognitive Traits that Predict College Success

LAK19: Proceedings of the 9th International Conference on Learning Analytics & Knowledge

ABSTRACT

References

Cited By

Index Terms

Recommendations

Prospectively predicting 4-year college graduation from student applications

Student Performance in Mathematics: Should we be Concerned?: Evidence from a Retail Course

Students with Learning Disabilities' Perceptions of Self-Determining Factors Contributing to College Success

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media