Skip to main content

Measuring Content Complexity of Technical Texts: Machine Learning Experiments

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11626))

Included in the following conference series:

Abstract

Classifying texts by their content complexity is important for applications like adaptive foreign language reading recommender systems and information retrieval. The goal of this paper is to propose a computational model of technical texts’ content complexity based on three criteria: knowledge depth, required knowledge, and content focus. To implement this model, 28 features of content and lexical complexity were extracted from 1702 texts of three types: general blogs, science journalistic texts and research papers. The machine learning experiments showed that content features alone can provide high classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.kaggle.com/rtatman/blog-authorship-corpus.

  2. 2.

    CORE (COnnecting REpositories) is an aggregation of papers from open access journals https://www.jisc.ac.uk/core.

  3. 3.

    Based on the shortest path that connects the senses and the maximum depth of the hierarchy in which the senses occur.

  4. 4.

    http://websites.psychology.uwa.edu.au/school/MRCDatabase/uwa_mrc.htm.

References

  1. Webb, N.: Alignment of science and mathematics standards and assessments in four states, Washington, D.C. CCSSO. Research Monograph No. 18: August 1999. https://www.researchgate.net/publication/239925507_Alignment_of_science_and_mathematics_standards_and_assessments_in_four_states

  2. Webb, N.: 28 March, Depth-of-Knowledge Levels for four content areas, unpublished paper (2002)

    Google Scholar 

  3. Wise, S.L., Kingsbury, G.G., Webb, N.L.: Evaluating content alignment in computerized adaptive testing. Educ. Measur. Issues Pract. 34(4), 41–48 (2015)

    Article  Google Scholar 

  4. Fahmi, I., Bouma, G.: Learning to Identify Definitions using Syntactic Features, Workshop of Learning Structured Information in Natural Language Applications, EACL, Italy (2006)

    Google Scholar 

  5. Fiser, D., Pollak S., Vintar S.: Learning to mine definitions from Slovene structured and unstructured knowledge-rich resources. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation, LREC 2010, pp. 2932–2936 (2010)

    Google Scholar 

  6. Pollak, S., Vavpetic, A., Kranjc, J., Lavrac N., Vinta, S.: NLP workflow for on-line definition extraction from English and Slovene Text Corpora. In: Proceedings of KONVENS, Vienna, 19 September (2012)

    Google Scholar 

  7. Rose, S., Dave, E., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Berry, M., Kogan, J. (eds.) Text Mining: Applications and Theory. Wiley, Hoboken (2010). ISBN 978-0-470-74982-1

    Google Scholar 

  8. Guiraud, P.: Problèmes et Méthodes de la Statistique Linguistique. D. Reidel, Dordrecht (1960)

    Google Scholar 

  9. Kurdi, M.Z.: Lexical and syntactic features selection for an adaptive reading recommendation system based on text complexity. In: Proceedings of the 2017 International Conference on Information System and Data Mining, ICISDM 2017, pp. 66–69 (2017)

    Google Scholar 

  10. Francis, W.N., Kucera, H.: Frequency Analysis of English Usage: Lexicon and Grammar. Houghton Mifflin, Boston (1982)

    Google Scholar 

  11. Nickerson, C.A., Cartwright, D.S.: Behavior Research Methods. Instrum. Comput. 16, 355 (1984). https://doi.org/10.3758/BF03202462

    Article  Google Scholar 

  12. Kurdi, M.Z.: Natural Language Processing and Computational Linguistics 2: Semantics, Discourse, and Applications, ISTE. ISTE-Wiley, London (2017)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Zakaria Kurdi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kurdi, M.Z. (2019). Measuring Content Complexity of Technical Texts: Machine Learning Experiments. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds) Artificial Intelligence in Education. AIED 2019. Lecture Notes in Computer Science(), vol 11626. Springer, Cham. https://doi.org/10.1007/978-3-030-23207-8_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-23207-8_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-23206-1

  • Online ISBN: 978-3-030-23207-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics