Applying Natural Language Processing and Hierarchical Machine Learning Approaches to Text Difficulty Classification

Balyan, Renu; McCarthy, Kathryn S.; McNamara, Danielle S.

doi:10.1007/s40593-020-00201-7

Applying Natural Language Processing and Hierarchical Machine Learning Approaches to Text Difficulty Classification

ARTICLE
Published: 25 June 2020

Volume 30, pages 337–370, (2020)
Cite this article

International Journal of Artificial Intelligence in Education Aims and scope Submit manuscript

2296 Accesses
27 Citations
Explore all metrics

Abstract

For decades, educators have relied on readability metrics that tend to oversimplify dimensions of text difficulty. This study examines the potential of applying advanced artificial intelligence methods to the educational problem of assessing text difficulty. The combination of hierarchical machine learning and natural language processing (NLP) is leveraged to predict the difficulty of practice texts used in a reading comprehension intelligent tutoring system, iSTART. Human raters estimated the text difficulty level of 262 texts across two text sets (Set A and Set B) in the iSTART library. NLP tools were used to identify linguistic features predictive of text difficulty and these indices were submitted to both flat and hierarchical machine learning algorithms. Results indicated that including NLP indices and machine learning increased accuracy by more than 10% as compared to classic readability metrics (e.g., Flesch-Kincaid Grade Level). Further, hierarchical outperformed non-hierarchical (flat) machine learning classification for Set B (72%) and the combined set A + B (65%), whereas the non-hierarchical approach performed slightly better than the hierarchical approach for Set A (79%). These findings demonstrate the importance of considering deeper features of language related to text difficulty as well as the potential utility of hierarchical machine learning approaches in the development of meaningful text difficulty classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text Difficulty Classification by Combining Machine Learning and Language Features

Focused Information Retrieval & English Language Instruction: A New Text Complexity Algorithm for Automatic Text Classification

Automatic Text Difficulty Estimation Using Embeddings and Neural Networks

Notes

https://www.scholastic.com/teachers/articles/teaching-content/leveled-reading-systems-explained/
For more on NLP, see McNamara et al. (2018). For a more thorough discussion of Coh-Metrix, see Graesser et al. (2004) and McNamara et al. (2014).
We were unable to locate downloadable software or corpora associated with these studies. Thus, we could not compare our algorithms to those used in these studies. Notably, that was not the purpose of this study nor does this affect the validity of the previous studies.

References

Allen, L. K., Jacovina, M. E., & McNamara, D. S. (2016). Cohesive features of deep text comprehension processes. In J. Trueswell, A. Papafragou, D. Grodner, & D. Mirman (Eds.), Proceedings of the 38th annual meeting of the cognitive science Society in Philadelphia, PA (pp. 2681–2686). Austin, TX: Cognitive Science Society.
Google Scholar
Allen, L. K., Snow, E. L., & McNamara, D. S. (2015). Are you reading my mind? Modeling students' reading comprehension skills with natural language processing techniques. In J. Baron, G. Lynch, N. Maziarz, P. Blikstein, A. Merceron, & G. Siemens (Eds.), Proceedings of the 5th International Learning Analytics & Knowledge Conference (LAK'15) (pp. 246–254). Poughkeepsie, NY: ACM.
Google Scholar
Aggarwal, C. C., & Zhai, C. (2012). A survey of text classification algorithms. In C. Aggarwal & C. Zhai (Eds.), Mining text data. Boston, MA: Springer.
Google Scholar
Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database (release 2). Distributed by Linguistic Data Consortium, University of Pennsylvania.
Babbar, R., Partalas, I., Gaussier, E., & Amini, M. R. (2013). On flat versus hierarchical classification in large-scale taxonomies. In Advances in Neural Information Processing Systems. 1824–1832.
Balyan, R., McCarthy, K. S., & McNamara, D. S. (2017). Combining machine learning and natural language processing to assess literary text comprehension. In X. Hu, T. Barnes, A. Hershkovitz, & L. Paquette (Eds.), Proceedings of the 10th International Conference on Educational Data Mining (EDM) (pp. 244–249). Wuhan: International Educational Data Mining Society.
Google Scholar
Balyan, R., McCarthy, K. S., & McNamara, D. S. (2018). Comparing machine learning classification approaches for predicting expository text difficulty. In Proceedings of the 31st Annual Florida Artificial Intelligence Research Society International Conference (FLAIRS). AAAI Press.
Begeny, J. C., & Greene, D. J. (2014). Can readability formulas be used to successfully gauge difficulty of reading materials? Psychology in the Schools, 51(2), 198–215.
Google Scholar
Benjamin, R. (2012). Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty. Educational Psychology Review, 24(1), 63–88.
Google Scholar
Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185–205). Cambridge, MA: MIT Press.
Google Scholar
Bormuth, J. R. (1966). Readability: A new approach. Reading research quarterly, pp. 79–132, 1.
Bormuth, J. R. (1969). Development of Readability Analysis. (final report, project no. 7-0052, contract no. OEC-3-7-070052-0326). Retrieved from ERIC database. (ED029166).
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
MATH Google Scholar
Brunato, D., De Mattei, L., Dell’Orletta, F., Iavarone, B., & Venturi, G. (2018). Is this sentence difficult? Do you agree?. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 2690-2699).
Caruana, R., & Niculescu-Mizil, A. (2006, June). An empirical comparison of supervised learning algorithms. In proceedings of the 23rd international conference on machine learning (pp. 161-168). ACM.
Casasent, D., & Wang, Y.-C. F. (2005). A hierarchical classifier using new support vector machine for automatic target recognition. Neural Networks, 18(5–6), 541–548.
Google Scholar
Cerri, R., Barros, R. C., & de Carvalho, A. C. (2015, July). Hierarchical classification of gene ontology-based protein functions with neural networks. In 2015 international joint conference on neural networks (IJCNN) (pp. 1-8). IEEE.
Cesa-Bianchi, N., Gentile, C., & Zaniboni, L. (2006). Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7(Jan), 31–54.
MathSciNet MATH Google Scholar
Chall, J. S. (1988). The beginning years. In B. L. Zakaluk & S, J. Samuels (Eds.) readability: Its past, present, and future. Newark, DE: International Reading association.
Collins-Thompson, K. (2014). Computational assessment of text readability: A survey of current and future research. ITL - International Journal of Applied Linguistics, 165(2), 97–135.
Google Scholar
Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology, 33(4), 497–505.
Google Scholar
Crossley, S. A., Allen, D., & McNamara, D. S. (2012). Text simplification and comprehensible input: A case for an intuitive approach. Language Teaching Research, 16, 89–108.
Google Scholar
Crossley, S. A., Allen, L. K., Snow, E. L., & McNamara, D. S. (2016a). Incorporating learning characteristics into automatic essay scoring models: What individual differences and linguistic features tell us about writing quality. Journal of Educational Data Mining, 8(2), 1–19.
Google Scholar
Crossley, S. A., Kyle, K., & Dascalu, M. (2018). The tool for the automatic analysis of cohesion 2.0: Integrating semantic similarity and text overlap. Behavioral Research Methods. 1-14.
Crossley, S. A., Kyle, K., & McNamara, D. S. (2016b). The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior Research Methods, 48(4), 1227–1237. https://doi.org/10.3758/s13428-015-0651-7.
Article Google Scholar
Crossley, S. A., & McNamara, D. S. (2009). Computationally assessing lexical differences in second language writing. Journal of Second Language Writing, 17, 119–135.
Google Scholar
Dale, E., & Chall, J. S. (1948). A formula for predicting readability. Educational Research Bulletin, 27(1), 11–28.
Google Scholar
Dimitrovski, I., Kocev, D., Loskovska, S., & Džeroski, S. (2011). Hierarchical annotation of medical images. Pattern Recognition, 44(10–11), 2436–2449.
Google Scholar
Dufty, D. F., Graesser, A. C., Louwerse, M., & McNamara, D. S. (2006). Assigning grade level to textbooks: Is it just readability? In Proceedings of the 28th Annual Conference of the Cognitive Science Society Austin, TX: Cognitive science society. In R. Sun and N. Miyake, Eds. 1251–1256.
Dumais, S. T., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the seventh international conference on Information and knowledge management (Bethesda, Maryland, USA, November 02–07, 1998). CIKM’98. ACM, New York, NY, 148–155.
Duran, N., Bellissens, C., Taylor, R., & McNamara, D. S. (2007). Quantifying text difficulty with automated indices of cohesion and semantics. In D. S. McNamara & G. Trafton (Eds.), Proceedings of the 29th annual meeting of the cognitive science society (pp. 233–238). Austin, TX: Cognitive Science Society.
Google Scholar
Feng, L., Jansche, M., Huenerfauth, M., & Elhadad, N. (2010, August). A comparison of features for automatic readability assessment. In Proceedings of the 23rd international conference on computational linguistics: Posters, 276–284. Association for Computational Linguistics.
Flesch, R. F. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221–233.
Google Scholar
François, T., & Miltsakaki, E. (2012). Do NLP and machine learning improve traditional readability formulas? In Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations, pages 49–57, Montreal, Canada, Association for Computational Linguistics.
Freund, Y., & Schapire, R. E. (1996, July). Experiments with a new boosting algorithm. In icml (Vol. 96, pp. 148-156).
Fry, E. (2002). Readability versus leveling. Reading Teacher, 56(3), 286–291.
MathSciNet Google Scholar
Fuchs, E., Niehaus, I., & Stoletzki, A. (2014). Das Schulbuch in der Forschung. Analysen und Empfehlungen für die Bildungspraxis. Göttingen: V&R unipress.
Google Scholar
Gee, J. P. (2004). An introduction to discourse analysis: Theory and method. Routledge.
George-Nektarios, T. (2013). Weka classifiers summary. Athens University of Economics and Bussiness Intracom-Telecom, Athens.
Gilhooly, K. J., & Logie, R. H. (1980). Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words. Behavior Research Methods & Instrumentation, 12(4), 395–427.
Google Scholar
Graesser, A. C., McNamara, D. S., & Kulikowich, J. M. (2011). Coh Metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40, 223–234.
Google Scholar
Graesser, A. C., McNamara, D. S., Louwerse, M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, and Computers, 36, 193–202.
Google Scholar
Gunning, R. (1969). The fog index after twenty years. Journal of Business Communication, 6(2), 3–13.
Google Scholar
Hartmann, J., Huppertz, J., Schamp, C., & Heitmann, M. (2019). Comparing automated text classification methods. International Journal of Research in Marketing, 36(1), 20–38.
Google Scholar
Heilman, M., Collins-Thompson, K., & Eskenazi, M. (2008). An analysis of statistical models and features for reading difficulty prediction. In Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics, Columbus, OH, USA, 71–79.
Ho, T. K. (1995). Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition (Montreal, QC, august 14–15, 1995). ICDAR’95, IEEE computer society Washington, DC, USA, 278–282.
Jackson, G. T., & McNamara, D. S. (2013). Motivation and performance in a game-based intelligent tutoring system. Journal of Educational Psychology, 105, 1036–1049.
Google Scholar
Jiang, Z., Gu, Q., Yin, Y., & Chen, D. (2018, August). Enriching word Embeddings with domain knowledge for readability assessment. In Proceedings of the 27th International Conference on Computational Linguistics, 366–378.
Johnson, A. M., McCarthy, K. S., Kopp, K. J., Perret, C. A., & McNamara, D. S. (2017). Adaptive Reading and writing instruction in iSTART and W-pal. In proceedings of the 30^th Florida artificial intelligence research society international conference (FLAIRS). AAAI Press.
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of 10th European Conference on Machine Learning (April 21-23). ECML’98. Springer-Verlag London, UK, 137-142.
Kate, R. J., Luo, X., Patwardhan, S., Franz, M., Florian, R., Mooney, R. J., Roukos, S., & Welty, C. (2010). Learning to predict readability using diverse linguistic features. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 546–554.
Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and Flesch Reading ease formula) for navy enlisted personnel. Research Branch Report 8–75, Millington, TN: Naval technical training, U. S. Naval Air Station, Memphis, TN.
Klare, G. R. (1974). Assessing readability. Reading Research Quarterly, 10, 62–102.
Google Scholar
Klare, G. R. (1984). Readability. In P. D. Pearson, R. Barr, M. L. Kamil, P. Mosenthal, & R. Dykstra (Eds.), Handbook of Reading research (pp. 681–744). New York: Longman.
Google Scholar
Kotani, K., Yoshimi, T., & Isahara, H. (2011). A machine learning approach to measurement of text readability for EFL learners using various linguistic features. US-China Education Review B, 6, 767–777.
Google Scholar
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4), 150.
Google Scholar
Krogh, A., & Vedelsby, J. (1994). Neural network ensembles, cross validation, and active learning. In Proceedings of 7^th International Conference on Neural Information Processing Systems (Denver, Colorado). NIPS’94. MIT press Cambridge, MA, USA, 231–238.
Kumar, S., Ghosh, J., & Crawford, M. M. (2002). Hierarchical fusion of multiple classifiers for Hyperspectral data analysis. Pattern Analysis and Applications, Spl. Issue on Fusion of Multiple Classifiers, 5(2), 210–220.
MathSciNet MATH Google Scholar
Kumar, S., & Ghosh, J. (1999). GAMLS: A generalized framework for associative modular learning systems. In Proceedings of SPIE conference on applications and science of computational intelligence II, SPIE proceedings, Orlando, FL, 3722, 24–35.
Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine grained indices of syntactic complexity and usage-based indices of syntactic sophistication (Doctoral Dissertation). Retrieved from http://scholarworks.gsu.edu/alesl_diss/35.
Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757–786. https://doi.org/10.1002/tesq.194.
Article Google Scholar
Kyle, K., Crossley, S. A., & Berger, C. (2018). The tool for the analysis of lexical sophistication version 2.0. Behavior Research Methods, 50(3), 1030–1046.
Google Scholar
Lennon, C., & Burdick, H. (2004). The lexile framework as an approach for reading measurement and success. (electronic publication on www.lexile.com).
Lieberman, M. G., & Morris, J. D. (2014). The precise effect of multicollinearity on classification prediction. Multiple Linear Regression Viewpoints, 40(1), 5–10.
Google Scholar
Malvern, D. D., Richards, B. J., Chipere, N., & Durán, P. (2004). Lexical diversity and language development: Quantification and assessment. Houndmills: Palgrave Macmillan.
Google Scholar
Martínez, A. M., & Kak, A. C. (2001). PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 228–233.
Google Scholar
Mayne, A., & Perry, R. (2009, March). Hierarchically classifying documents with multiple labels. In 2009 IEEE symposium on computational intelligence and data mining (pp. 133-139). IEEE.
McCallum, A., & Nigam, K. (1998). A comparison of event models for naive Bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, tech. Rep. WS-98-05, AAAI press.
McCarthy, K. S., Watanabe, M. , Dai, J., & McNamara, D. S. (in press). Personalized learning in iSTART: Past modifications and future design. Journal of Research on Technology in Education.
McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD). Dissertation abstracts international, 66, UMI no. 3199485.
McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42, 381–392.
Google Scholar
McNamara, D. S, Allen, L. K., McCarthy, S. & Balyan, R. (2018). NLP: Getting computers to understand discourse. In Deep Comprehension (pp. 224-236). Routledge.
McNamara, D. S., Crossley, S. A., & Roscoe, R. D. (2013). Natural language processing in an intelligent writing strategy tutoring system. Behavior Research Methods, 45, 499–515.
Google Scholar
McNamara, D. S., Crossley, S. A., Roscoe, R. D., Allen, L. K., & Dai, J. (2015). A hierarchical classification approach to automated essay scoring. Assessing Writing, 23, 35–59.
Google Scholar
McNamara, D. S., Graesser, A. C., McCarthy, P., & Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix. Cambridge: Cambridge University Press.
Google Scholar
McNamara, D. S., Graesser, A. C., & Louwerse, M. M. (2012). Sources of text difficulty: Across genres and grades. In J. P. Sabatini, E. Albro, & T. O'Reilly (Eds.), Measuring up: Advances in how we assess reading ability (pp. 89–116). RandL Education: Lanham, MD.
Google Scholar
McNamara, D. S., Kintsch, E., Songer, N. B., & Kintsch, W. (1996). Are good texts always better? Interactions of text coherence, background knowledge, and levels of understanding in learning from text. Cognition and Instructions, 14, 1–43.
Google Scholar
McNamara, D. S., Levinstein, I. B., & Boonthum, C. (2004). iSTART: Interactive strategy trainer for active reading and thinking. Behavioral Research Methods, Instruments, and Computers, 36, 222–233.
Google Scholar
Mika, S., Ratsch, G., Weston, J., Scholkopf, B., & Mullers, K. R. (1999, August). Fisher discriminant analysis with kernels. In neural networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop (cat. No. 98th8468) (pp. 41-48). IEEE.
Millis, K., Magliano, J. P., Wiemer-Hastings, K., Todaro, S., & McNamara, D. S. (2007). Assessing and improving comprehension with latent semantic analysis. In T. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp. 207–225). Mahwah, NJ: Erlbaum.
Google Scholar
National Governors Association Center for Best Practices. (2010). Common Core State Standards. National Governors Association Center for best practices. Washington, D. C: Council of Chief State School Officers.
Google Scholar
Ozuru, Y., Dempsey, K., Sayroo, J., & McNamara, D. S. (2005). Effect of text cohesion on comprehension of biology texts. In B. G. Bara, L. Barsalou, & M. Bucciarelli (Eds.), Proceedings of the 27th annual conference of the cognitive science society (pp. 1696–1701). Mahwah, NJ: Erlbaum.
Google Scholar
Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery, and meaningfulness values for 925 nouns. Journal of Experimental Psychology. 76, 1-2 (Jan.), 1-25.
Perfetti, C. A., Landi, N., & Oakhill, J. (2005). The acquisition of reading comprehension skill. In M. J. Snowling & C. Hulme (Eds.), The science of Reading: A handbook (pp. 227–247). Oxford: Blackwell.
Google Scholar
Perret, C. A., Johnson, A. M., MCarthy, K. S., Guerrero, T. A., & McNamara, D.S. (2017). StairStepper: An adaptive remedial iSTART module. In Proceedings of the 18th International Conference on Artificial Intelligence in Education (AIED), Wuhan, China: Springer.
Pilán, I., Vajjala, S., & Volodina, E. (2016). A readable read: Automatic assessment of language learning materials based on linguistic complexity. International Journal of Computational Linguistics and Applications, 7, 143–159.
Google Scholar
Pilán, I., Volodina, E., & Johansson, R. (2014). Rule-based and machine learning approaches for second language sentence-level readability. In Proceedings of the ninth workshop on innovative use of NLP for building educational applications, Baltimore, Maryland USA, 174–184.
Pitler, E., & Nenkova, A. (2008, October). Revisiting readability: A unified framework for predicting text quality. In Proceedings of the conference on empirical methods in natural language processing, 186–195. Association for Computational Linguistics.
Rojas, R. (1996). Neural networks - a systematic introduction. Springer-Verlag, Berlin.
Salsbury, T., Crossley, S. A., & McNamara, D. S. (2011). Psycholinguistic word information in second language oral discourse. Second Language Research, 27, 343–360.
Google Scholar
Schapire, R. E., & Singer, Y. (1999). BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2–3), 135–168.
MATH Google Scholar
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge, MA: MIT Press.
MATH Google Scholar
Schwarm, S. E., & Ostendorf, M. (2005). Reading level assessment using support vector machines and statistical language models. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 523-530). Association for Computational Linguistics.
Schwenker, F. (2000). Hierarchical support vector machines for multiclass pattern recognition. In Proceedings of 4th KES, Brighton, UK, 2, 561–565.
Si, L., & Callan, J. (2001, October). A statistical model for scientific readability. In Proceedings of the tenth international conference on Information and knowledge management (pp. 574-576). ACM.
Snow, E. L., Jacovina, M. E., Jackson, G. T., & McNamara, D. S. (2016). iSTART-2: A reading comprehension and strategy instruction tutor. In Adaptive educational technologies for literacy instruction, D.S. McNamara and S. A. Crossley, Eds., Taylor and Francis, Routledge: NY, 104-121.
Stenner, A. J., Horabin, I., Smith, D. R., & Smith, M. (1988). The lexile framework. Durham, NC: MetaMetrics.
Google Scholar
Sun, A. & Lim, E. P. (2001). Hierarchical text classification and evaluation. In proceedings of the IEEE international conference on data mining (ICDM 2001), San Jose, CA, USA, 29 November–2 December 2001; pp. 521–528.
Sung, Y. T., Chen, J. L., Cha, J. H., Tseng, H. C., Chang, T. H., & Chang, K. E. (2015). Constructing and validating readability models: The method of integrating multilevel linguistic features with machine learning. Behavior Research Methods, 47(2), 340–354.
Google Scholar
Tanaka-Ishii, K., Tezuka, S., & Terada, H. (2010). Sorting by readability. Computational Linguistics, 36(2), 203–227.
Google Scholar
Toglia, M. P., & Battig, W. F. (1978). Handbook of semantic word norms. Lawrence Erlbaum.
Triguero, I., & Vens, C. (2016). Labelling strategies for hierarchical multi-label classification techniques. Pattern Recognition, 56, 170–183.
Google Scholar
Vajjala, S., & Meurers, D. (2012, June). On improving the accuracy of readability classification using insights from second language acquisition. In proceedings of the seventh workshop on building educational applications using NLP (pp. 163-173). Association for Computational Linguistics.
van Dijk, T. A. (1985). Semantic discourse analysis. In T. van Dijk (Ed.), Handbook of discourse analysis (Vol. 2, pp. 103–136). London: Academic Press.
Google Scholar
Vygotsky, L. (1978) Mind in society: The development of higher psychological processes. (M. Cole, V. John-Steiner, S. Scribner, & E. Souberman, Trans.). Cambridge, MA: Harvard University Press.
Wang, Y.-C. F., & Casasent, D. (2009). A support vector hierarchical method for multi-class classification and rejection. In Proceedings of International Joint Conference on Neural Networks, Atlanta, Georgia, USA, June, 14-19, 3281–3288.
Google Scholar
Witten, I. H., Frank, E., Trigg, L. E., Hall, M. A., Holmes, G., & Cunningham, S. J. (1999). Weka: Practical machine learning tools and techniques with Java implementations.
Zhang, G. P. (2000). Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 451–462.
Google Scholar
Zimek, A., Buchwald, F., Frank, E., & Kramer, S. (2008). A study of hierarchical and flat classification of proteins. IEEE Transactions on Computational Biology and Bioinformatics, 7(3), 563–571.
Google Scholar
Zipf, G. K. (1949). Human behavior and the principle of least effort. Reading, MA: Addison-Wesley.
Google Scholar

Download references

Acknowledgements

The authors would like to recognize the support of the Institute of Education Sciences, U.S. Department of Education, through Grants R305A180261, R305A190050 and R305A180144, and the Office of Naval Research, through Grant N000141712300, to Arizona State University. The opinions expressed are those of the authors and do not represent views of the Institute, the U.S. Department of Education, or the Office of Naval Research.

Author information

Authors and Affiliations

Ira A. Fulton School of Engineering, Arizona State University, Mesa, AZ, USA
Renu Balyan
Department of Learning Sciences, Georgia State University, Atlanta, GA, USA
Kathryn S. McCarthy
Department of Psychology, Arizona State University, Tempe, AZ, USA
Danielle S. McNamara

Authors

Renu Balyan
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn S. McCarthy
View author publications
You can also search for this author in PubMed Google Scholar
Danielle S. McNamara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renu Balyan.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Table 8 The evaluation Metrics (Set A) using FKGL

Full size table

Table 9 The evaluation Metrics (Set B) using FKGL

Full size table

Table 10 The evaluation Metrics (Set A + Set B) using FKGL

Full size table

Table 11 The evaluation Metrics for Set A using FKGL+

Full size table

Table 12 The evaluation Metrics for Set B using FKGL+

Full size table

Table 13 The evaluation Metrics for Set A + Set B using FKGL+

Full size table

Appendix B

Table 14 Machine Learning algorithms used in the study (Weka 3.8.1)

Full size table

Appendix C

Table 15 Maximum and Minimum un-normalized values for Linguistic indices (Set A)

Full size table

Table 16 Maximum and Minimum un-normalized values for Linguistic indices (Set B)

Full size table

Table 17 Maximum and Minimum un-normalized values for Linguistic indices (Set A + Set B)

Full size table

Appendix D

Table 18 Text Difficulty Level-wise Performance metrics

Full size table

Table 19 Confusion Matrix for Set A

Full size table

Table 20 Confusion Matrix for Set B

Full size table

Table 21 Confusion Matrix for the Combined Set (A + B)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balyan, R., McCarthy, K.S. & McNamara, D.S. Applying Natural Language Processing and Hierarchical Machine Learning Approaches to Text Difficulty Classification. Int J Artif Intell Educ 30, 337–370 (2020). https://doi.org/10.1007/s40593-020-00201-7

Download citation

Published: 25 June 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s40593-020-00201-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying Natural Language Processing and Hierarchical Machine Learning Approaches to Text Difficulty Classification

Abstract

Access this article

Similar content being viewed by others

Text Difficulty Classification by Combining Machine Learning and Language Features

Focused Information Retrieval & English Language Instruction: A New Text Complexity Algorithm for Automatic Text Classification

Automatic Text Difficulty Estimation Using Embeddings and Neural Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A

Appendix B

Appendix C

Appendix D

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Applying Natural Language Processing and Hierarchical Machine Learning Approaches to Text Difficulty Classification

Abstract

Access this article

Similar content being viewed by others

Text Difficulty Classification by Combining Machine Learning and Language Features

Focused Information Retrieval & English Language Instruction: A New Text Complexity Algorithm for Automatic Text Classification

Automatic Text Difficulty Estimation Using Embeddings and Neural Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A

Appendix B

Appendix C

Appendix D

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation