Skip to main content
Log in

Improving reading comprehension step by step using Online-Boost text readability classification system

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Online reading exercise becomes the universal tool for a wide variety of second language learning systems. Readability sorting is a key step to display suitable reading materials for the learners. Traditional text readability classification techniques cannot meet the request for online learning perfectly as they do not have real-time classification ability and cannot get the information of learners’ language levels. This paper presents a novel framework for online reading exercise which is based on the Online-Boost text readability classification algorithm. We first modified the multinomial Naïve Bayes model to give the reading materials initial readability. We then proposed an Online-Boost algorithm for the text readability update and learners’ reading comprehension evaluation according to the learners’ answers correct rate of the text. Finally, the system would deliver reading materials with different difficulties to testers with different levels of reading ability in real time. The experimental result reveals that the novel method has ideal ease of use and can significantly improve the performance of second language learners.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Krashen SD (1989) The input hypothesis: issues and implications. Mod Lang J 73(4):440–464

    Article  Google Scholar 

  2. Klingner JK, Artiles AJ, Barletta LM (2006) English language learners who struggle with reading: language acquisition or LD? J Learn Disabil 39(2):107–128

    Article  Google Scholar 

  3. http://www.ets.org/toefl/ibt

  4. Mc Laughlin GH (1969) SMOG grading—a new readability formula. J Read 20(5):639–646

    Google Scholar 

  5. Farr JN, Jenkins JJ, Paterson DG (1951) Simplification of Flesch reading ease formula. J Appl Psychol 35(5):333–337

    Article  Google Scholar 

  6. Courtis JK, Hassan S (2002) Reading ease of bilingual annual reports. J Bus Commun 39(4):394–413

    Article  Google Scholar 

  7. Graesser AC, McNamara DS, Louwerse MM, Cai Z (2004) Coh–Metrix: analysis of text on cohesion and language. Behav Res Methods 36(2):193–202

    Article  Google Scholar 

  8. Nagy WE, Anderson RC (1987) Learning word meanings from context during normal reading. Am Educ Res J 24(2):237–270

    Article  Google Scholar 

  9. Socher R, Bauer J, Manning CD, Ng AY (2013) Parsing with compositional vector grammars. In: The annual meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, pp 213–220

  10. Schwarm SE, Ostendorf M (2005) Sorting texts by readability. In: Proceedings of the 43rd annual meeting on Association for Computational Linguistics (ACL ‘05), pp 523–530

  11. Tanaka-Ishii K, Tezuka S, Terada H (2010) Narrow-band analyzer. Comput Linguist 36(2):503–527

    Google Scholar 

  12. Schwenker F, Trentin E (2014) Pattern classification and clustering: a review of partially supervised learning approaches. Pattern Recognit Lett 37:4–14

    Article  Google Scholar 

  13. Feldman R, Sanger J (2007) The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press, New York, pp 77–78

    Google Scholar 

  14. Huanling T, Jun W, Zhengkui L (2010) An enhanced AdaBoost algorithm with Naive Bayesian text categorization based on a novel re-weighting strategy. Int J Innov Comput Inf Control 6(11):5299–5310

    Google Scholar 

  15. Masnadi-Shirazi H, Vasconcelos N (2011) Cost-sensitive boosting. IEEE Trans Pattern Anal Mach Intell 33(2):294–309. doi:10.1109/TPAMI.2010.71

    Article  Google Scholar 

  16. Vu TT, Braga-Neto UM (2010) Small-sample error estimation for bagged classification rules. EURASIP J Adv Signal Process 2010:1–12

    Article  Google Scholar 

  17. Xiaoyong L, Hui F (2012) A hybrid algorithm for text classification problem. Prz Elektrotech 88(1B):8–11

    Google Scholar 

  18. Ganiz MC, George C, Pottenger WM (2011) Higher order Naive Bayes: a novel non-IID approach to text classification. IEEE Trans Knowl Data Eng 23(7):1022–1034. doi:10.1109/TKDE.2010.160

    Article  Google Scholar 

  19. Tan S, Li Y, Sun H et al (2014) Interpreting the public sentiment variations on twitter. IEEE Trans Knowl Data Eng 26(5):1158–1170

    Article  Google Scholar 

  20. Yuanping Z, Mingzhu T, Jia Y (2007) Rocchio text classification based on ontology. In: 7th international conference of Chinese computing (ICCC 2007), China, 2007, pp 266–271

  21. Kwon O-W, Lee J-H (2003) Text categorization based on k-nearest neighbor approach for Web site classification. Inf Process Manag 39(1):25–44

    Article  MATH  Google Scholar 

  22. Rätsch G, Onoda T, Müller K-R (2001) Soft margins for AdaBoost. Mach Learn 42(3):287–320

    Article  MATH  Google Scholar 

  23. Javed I, Afzal H, Majeed A et al (2014) Towards creation of linguistic resources for bilingual sentiment analysis of twitter data. In: 19th international conference on applications of natural language to information systems, Montpellier, France, pp 232–236

  24. Mikolov T (2012) Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology

  25. Crossley SA, Greenfield J, McNamara DS (2008) Assessing text readability using cognitively based indices. Tesol Q 42(3):475–493

    Google Scholar 

  26. Kanungo T, Orr D (2009) Predicting the readability of short web summaries. In: Proceedings of the second ACM international conference on web search and data mining, NY, USA, pp 202–211

  27. Ganiz MC, George C, Pottenger WM (2011) Higher order Naive Bayes: a novel non-IID approach to text classification. IEEE Trans Knowl Data Eng 23(7):1022–1034

    Article  Google Scholar 

  28. Miranda V, Jaco D, Henk F (2012) Ethnic concentration in the neighbourhood and majority and minority language: a study of first and second-generation immigrants. Soc Sci Res 41(3):555–569

    Article  Google Scholar 

  29. Abuom TO, Roelien B (2012) Characteristics of Swahili–English bilingual agrammatic spontaneous speech and the consequences for understanding agrammatic aphasia. J Neurolinguist 15(5):885–893

    Google Scholar 

  30. González-Ortega D, Díaz-Pernas FJ, Martínez-Zarzuela M, Antón-Rodríguez M, Díez-Higuera JF, Boto-Giralda D (2010) Real-time hands, face and facial features detection and tracking: application to cognitive rehabilitation tests monitoring. J Netw Comput Appl 33(4):447–466

    Article  Google Scholar 

  31. Schapire RE, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39(2–3):135–168

    Article  MATH  Google Scholar 

  32. Gambina A, Szczureka E, Dutkowskia J, Bakunc M, Dadlez M (2009) Classification of peptide mass fingerprint data by novel no-regret boosting method. Comput Biol Med 39(5):460–473

    Article  Google Scholar 

  33. Schapire RE (2005) Boosting with prior knowledge for call classification. IEEE Trans Speech Audio Process 13(2):174–181. doi:10.1109/TSA.2004.840937

    Article  Google Scholar 

  34. Zhu J, Rosset S, Zou H, Hastie T (2006) Multi-class AdaBoost. Stanford Education http://www.stanford.edu/~hastie/Papers/samme.pdf.2006

  35. Masnadi-Shirazi H, Vasconcelos N (2007) Asymmetric boosting. In: Proceedings of the 24th international conference on machine learning (ICML ‘07), NY, USA, pp 609–616

  36. Hach F, Numanagić I, Alkan C, Sahinalp SC (2012) SCALCE: boosting sequence compression algorithms using locally consistent encoding. Bioinformatics 28(23):3051–3057

    Article  Google Scholar 

  37. Ting KM, Zheng ZJ (2003) A study of AdaBoost with naive Bayesian classifiers: weakness and improvement. Comput Intell 19(2):186–200

    Article  MathSciNet  Google Scholar 

  38. Yijun S, Sinisa T, Jian L (2006) Reducing the overfitting of AdaBoost by controlling its data distribution skewness. Int J Pattern Recogn Artif Intell 20(7):1093–1116

    Article  Google Scholar 

  39. Song E, Huang D, Ma G (2011) Semi-supervised multi-class Adaboost by exploiting unlabeled data. Expert Syst Appl 38(6):6720–6726

    Article  Google Scholar 

  40. Uguz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl Based Syst 24(7):1024–1032

    Article  Google Scholar 

  41. Larson P, Diaconu C, Zwilling MJ, Freedman CS (2011) Optimistic multi-version concurrency control system used for controlling concurrently executing transactions, assigns created version of data records of data store as two timestamps indicating lifetime of version. US Patent US 2011153566-A1 (online). http://www.patentlens.net/patentlens/patent/US_201153566-A1

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei La.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

La, L., Wang, N. & Zhou, Dp. Improving reading comprehension step by step using Online-Boost text readability classification system. Neural Comput & Applic 26, 929–939 (2015). https://doi.org/10.1007/s00521-014-1770-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-014-1770-2

Keywords

Navigation