Abstract
Variation in college application materials related to social stratification is a contentious topic in social science and national discourse in the United States. This line of research has also started to use computational methods to consider qualitative materials, such as personal statements and letters of recommendation. Despite the prominence of this topic, fewer studies have considered a fairly common academic pathway: transferring. Approximately 40% of all college students in the US transfer schools at least once. One quirk of the system is that students from community colleges are applying for the same spots for students already enrolled in four year schools and trying to transfer. How might different aspects the transfer application itself correlate with institutional stratification and make students more or less distinguishable? We use a dataset of 20,532 transfer admissions essays submitted to the University of California system to describe how transfer applicants vary linguistically, culturally, and narratively with respect to academic pathways and essay prompts. Using a variety of methods for computational text analysis and qualitative coding, we find that essays written by community college students tend to be distinct from those written by university students. However, the strength and character of these results changed with the writing prompt provided to applicants. These results show how some forms of stratification, such as the type of school students attend, inform educational processes intended to equalize opportunity and how combining computational and human reading might illuminate these patterns.




Similar content being viewed by others
Data availability
A link to the data and code needed to replicate the study is available in the supplementary materials.
Notes
Gender neutral term referring to all people who descend from people across Latin America.
References
Bowles, S., & Gintis, H. (2002). Schooling in capitalist America revisited. Sociology of Education, 75, 1–18.
Harrison, M. H., Hernandez, P. A., & Stevens, M. L. (2022). Should I start at math 101? Content repetition as an academic strategy in elective curriculums. Sociology of Education, 95(2), 133–152.
Dixon-Román, E. J., Everson, H. T., & McArdle, J. J. (2013). Race, poverty and sat scores: Modeling the influences of family income on black and white high school students’ sat performance. Teachers College Record, 115(4), 1–33.
Alvero, A., Giebel, S., Gebre-Medhin, B., Antonio, A. L., Stevens, M. L., & Domingue, B. W. (2021). Essay content and style are strongly related to household income and sat scores: Evidence from 60,000 undergraduate applications. Science Advances, 7(42), 9031.
Alvero, A., Arthurs, N., Antonio, A.L., Domingue, B.W., Gebre-Medhin, B., Giebel, S., & Stevens, M.L. (2020). Ai and holistic review: informing human reading in college admissions. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 200–206.
Kim, B.H. (2022). Applying data science techniques to promote equity and mobility in education and public policy. PhD thesis.
Rothstein, J. (2022). Qualitative information in undergraduate admissions: A pilot study of letters of recommendation. Economics of Education Review, 89, 102285.
Salazar, K. G., Jaquette, O., & Han, C. (2021). Coming soon to a neighborhood near you? Off-campus recruiting by public research universities. American Educational Research Journal, 58(6), 1270–1314.
Spencer, G. (2021). Off the beaten path: Can statewide articulation support students transferring in nonlinear directions? American Educational Research Journal, 58(5), 1070–1102.
Crisp, G., & Delgado, C. (2014). The impact of developmental education on community college persistence and vertical transfer. Community College Review, 42(2), 99–117.
Schudde, L., & Goldrick-Rab, S. (2015). On second chances and stratification: How sociologists think about community colleges. Community College Review, 43(1), 27–45.
Malcom-Piqueux, L., Bensimon, E. M., Suro, R., Fischer, A., Bartle, A., Loudenback, J., & Rivas, J. (2013). Addressing Latino outcomes at California’s Hispanic-serving institutions. University of Southern California Center for Urban Education.
Quintana, R. (2021). What race and gender stand for: Using Markov blankets to identify constitutive and mediating relationships. Journal of Computational Social Science. https://doi.org/10.1007/s42001-021-00152-6
Lamont, M. (2012). Toward a comparative sociology of valuation and evaluation. Annual Review of Sociology, 38, 201–221.
Gebre-Medhin, B., Giebel, S., Alvero, A. J., Domingue, B. W., Stevens, M. L., & Antonio, A. L. (2022). Application essays and the ritual production of merit in us selective admissions. Poetics. https://doi.org/10.1016/j.poetic.2022.101706
Pennebaker, J. W., Chung, C. K., Frazee, J., Lavergne, G. M., & Beaver, D. I. (2014). When small words foretell academic success: The case of college admissions essays. PloS One, 9(12), 115844.
Arthurs, N., & Alvero, A. J. (2020). Whose truth is the" ground truth"? college admissions essays and bias in word vector evaluation methods. In: A. N. Rafferty, J. Whitehill, V. Cavalli-Sforza, C. Romero (Eds.), Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020) (pp 342–349)
Jones, S. (2013). “Ensure that you stand out from the crowd’’: A corpus-based analysis of personal statements according to applicants’ school type. Comparative Education Review, 57(3), 397–423.
Stevens, M. L. (2009). Creating a Class. Harvard University Press.
Bastedo, M. N., Bell, D., Howell, J. S., Hsu, J., Hurwitz, M., Perfetto, G., & Welch, M. (2021). Admitting students in context: Field experiments on information dashboards in college admissions. The Journal of Higher Education. https://doi.org/10.1080/00221546.2021.1971488
McFarland, D. A., Khanna, S., Domingue, B. W., & Pardos, Z. A. (2021). Education data science: Past, present, future. AERA Open, 7, 23328584211052056.
Fischer, C., Pardos, Z. A., Baker, R. S., Williams, J. J., Smyth, P., Yu, R., Slater, S., Baker, R., & Warschauer, M. (2020). Mining big data in education: Affordances and challenges. Review of Research in Education, 44(1), 130–160.
Singer, J. D. (2019). Reshaping the arc of quantitative educational research: It’s time to broaden our paradigm. Journal of Research on Educational Effectiveness, 12(4), 570–593.
Edelmann, A., Wolff, T., Montagne, D., & Bail, C. A. (2020). Computational social science and sociology. Annual Review of Sociology, 46, 61–81.
Nelson, L. K. (2020). Computational grounded theory: A methodological framework. Sociological Methods & Research, 49(1), 3–42.
Nelson, L. K., Burk, D., Knudsen, M., & McCall, L. (2021). The future of coding: A comparison of hand-coding and three types of computer-assisted text analysis methods. Sociological Methods & Research, 50(1), 202–237.
Ishitani, T. T., & Flood, L. D. (2018). Student transfer-out behavior at four-year institutions. Research in Higher Education, 59(7), 825–846.
Dowd, A. C., & Melguizo, T. (2008). Socioeconomic stratification of community college transfer access in the 1980s and 1990s: Evidence from HS &B and NELS. The Review of Higher Education, 31(4), 377–400.
Gerber, T. P., & Cheung, S. Y. (2008). Horizontal stratification in postsecondary education: Forms, explanations, and implications. Annual Review of Sociology, 34, 299–318.
Posselt, J. R., & Grodsky, E. (2017). Graduate education and social stratification. Annual Review of Sociology, 43, 353–378.
Bourdieu, P. (1987). Distinction: A Social Critique of the Judgement of Taste. Harvard University Press.
Stoltz, D. S., & Taylor, M. A. (2019). Concept mover’s distance: Measuring concept engagement via word embeddings in texts. Journal of Computational Social Science, 2(2), 293–313.
Kim, J. Y. (2021). Integrating human and machine coding to measure political issues in ethnic newspaper articles. Journal of Computational Social Science, 4(2), 585–612.
Lareau, A., & Horvat, E. M. (1999). Moments of social inclusion and exclusion race, class, and cultural capital in family-school relationships. Sociology of Education, 72, 37–53.
Bourdieu, P. (1991). Language and Symbolic Power. Harvard University Press.
Bernstein, B. (1964). Elaborated and restricted codes: Their social origins and some consequences. American Anthropologist, 66(6), 55–69.
Durkheim, E. (2012). Moral Education. Courier Corporation.
Takacs, C. G. (2020). Becoming interesting: Narrative capital development at elite colleges. Qualitative Sociology, 43(2), 255–270.
Toubia, O., Berger, J., & Eliashberg, J. (2021). How quantifying the shape of stories predicts their success. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.2011695118
Simpson, E. H. (1949). Measurement of diversity. Nature, 163(4148), 688–688.
Kincaid, J. P., Fishburne, R. P., Jr., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, FOG count and Flesch reading ease formula) for Navy enlisted personnel. Naval technical training command millington TN research branch: Technical report.
Blei, D., & Lafferty, J. (2006). Correlated topic models. Advances in Neural Information Processing systems, 18, 147.
Pennebaker, J.W., Boyd, R.L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Austin, TX: University of Texas at Austin.
Roberts, M. E., Stewart, B. M., & Tingley, D. (2019). STM: An R package for structural topic models. Journal of Statistical Software, 91, 1–40.
Salton, G. (1971). The SMART retrieval system-experiments in automatic document processing. Prentice-Hall Inc.
Porter, M.F. (2001). Snowball: A language for stemming algorithms. Published online. Accessed 3 Nov 2008.
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press.
Schofield, A., Magnusson, M., & Mimno, D. (2017). Pulling out the stops: Rethinking stopword removal for topic models. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 432–436.
Nikita, M. (2016). ldatuning: Tuning of the latent Dirichlet allocation models parameters. R package version 0.2-0. https://CRAN.R-project.org/package=ldatuning
Domingue, B., Rahal, C., Faul, J., Freese, J., Kanopka, K., Rigos, A., Stenhaug, B., & Tripathi, A. (2021). InterModel Vigorish. A novel approach for quantifying predictive accuracy with binary outcomes: IMV).
Pryzant, R., Card, D., Jurafsky, D., Veitch, V., & Sridhar, D. (2021). Causal effects of linguistic properties. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4095–4109.
Egami, N., Fong, C.J., Grimmer, J., Roberts, M.E., Stewart, B.M. (2018). How to make causal inferences using texts. arXiv preprint arXiv:1802.02163
Honnibal, M., Montani, I., Van Landeghem, S., & Boyd, A. (2017). spaCy: industrial-strength natural language processing in Python. https://doi.org/10.5281/zenodo.1212303.
Angrist, J., & Imbens, G. (1995). Identification and estimation of local average treatment effects. National Bureau of Economic Research Cambridge: Mass.
Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91(434), 444–455.
Greifer, N. (2020). WeightIt: Weighting for covariate balance in observational studies. R package version 0.9. 0.
Cascio, M. A., Lee, E., Vaudrin, N., & Freedman, D. A. (2019). A team-based approach to open coding: Considerations for creating intercoder consensus. Field Methods, 31(2), 116–130.
Bell, K., Hong, J., McKeown, N., & Voss, C. (2021).The Recon Approach: A new direction for machine learning in criminal law. Berkeley Technology Law Journal, 37.
Jayaratne, M., & Jayatilleke, B. (2021). Predicting job-hopping motive of candidates using answers to open-ended interview questions. Journal of Computational Social Science. https://doi.org/10.1007/s42001-021-00138-4
Green, B., & Chen, Y. (2019). The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3(CSCW), 1–24.
Yu, R., Lee, H., & Kizilcec, R.F. (2021). Should college dropout prediction models include protected attributes? In: Proceedings of the Eighth ACM Conference on Learning@ Scale, pp. 91–100.
Posselt, J. R. (2016). Inside graduate admissions. Harvard University Press.
Acknowledgements
We thank the Stanford Institute for Social Science Research and their Community College Research Experience program for helping the team come together. We also thank Ben Domingue and Klint Kanopka for their helpful, consistent feedback for the IMV section. We thank anthony lising antonio and the rest of the Student Narratives Lab for their support and feedback. We thank Melissa Mesinas and Rosalía C. Zárate for contributing to a previous version of this work that was presented at the Association for the Study of Higher Education conference in 2020. Finally, we thank the editors and reviewers for great feedback that helped improve the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alvero, A., Pal, J. & Moussavian, K.M. Linguistic, cultural, and narrative capital: computational and human readings of transfer admissions essays. J Comput Soc Sc 5, 1709–1734 (2022). https://doi.org/10.1007/s42001-022-00185-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42001-022-00185-5