Abstract
Online reading exercise becomes the universal tool for a wide variety of second language learning systems. Readability sorting is a key step to display suitable reading materials for the learners. Traditional text readability classification techniques cannot meet the request for online learning perfectly as they do not have real-time classification ability and cannot get the information of learners’ language levels. This paper presents a novel framework for online reading exercise which is based on the Online-Boost text readability classification algorithm. We first modified the multinomial Naïve Bayes model to give the reading materials initial readability. We then proposed an Online-Boost algorithm for the text readability update and learners’ reading comprehension evaluation according to the learners’ answers correct rate of the text. Finally, the system would deliver reading materials with different difficulties to testers with different levels of reading ability in real time. The experimental result reveals that the novel method has ideal ease of use and can significantly improve the performance of second language learners.
Similar content being viewed by others
References
Krashen SD (1989) The input hypothesis: issues and implications. Mod Lang J 73(4):440–464
Klingner JK, Artiles AJ, Barletta LM (2006) English language learners who struggle with reading: language acquisition or LD? J Learn Disabil 39(2):107–128
Mc Laughlin GH (1969) SMOG grading—a new readability formula. J Read 20(5):639–646
Farr JN, Jenkins JJ, Paterson DG (1951) Simplification of Flesch reading ease formula. J Appl Psychol 35(5):333–337
Courtis JK, Hassan S (2002) Reading ease of bilingual annual reports. J Bus Commun 39(4):394–413
Graesser AC, McNamara DS, Louwerse MM, Cai Z (2004) Coh–Metrix: analysis of text on cohesion and language. Behav Res Methods 36(2):193–202
Nagy WE, Anderson RC (1987) Learning word meanings from context during normal reading. Am Educ Res J 24(2):237–270
Socher R, Bauer J, Manning CD, Ng AY (2013) Parsing with compositional vector grammars. In: The annual meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, pp 213–220
Schwarm SE, Ostendorf M (2005) Sorting texts by readability. In: Proceedings of the 43rd annual meeting on Association for Computational Linguistics (ACL ‘05), pp 523–530
Tanaka-Ishii K, Tezuka S, Terada H (2010) Narrow-band analyzer. Comput Linguist 36(2):503–527
Schwenker F, Trentin E (2014) Pattern classification and clustering: a review of partially supervised learning approaches. Pattern Recognit Lett 37:4–14
Feldman R, Sanger J (2007) The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press, New York, pp 77–78
Huanling T, Jun W, Zhengkui L (2010) An enhanced AdaBoost algorithm with Naive Bayesian text categorization based on a novel re-weighting strategy. Int J Innov Comput Inf Control 6(11):5299–5310
Masnadi-Shirazi H, Vasconcelos N (2011) Cost-sensitive boosting. IEEE Trans Pattern Anal Mach Intell 33(2):294–309. doi:10.1109/TPAMI.2010.71
Vu TT, Braga-Neto UM (2010) Small-sample error estimation for bagged classification rules. EURASIP J Adv Signal Process 2010:1–12
Xiaoyong L, Hui F (2012) A hybrid algorithm for text classification problem. Prz Elektrotech 88(1B):8–11
Ganiz MC, George C, Pottenger WM (2011) Higher order Naive Bayes: a novel non-IID approach to text classification. IEEE Trans Knowl Data Eng 23(7):1022–1034. doi:10.1109/TKDE.2010.160
Tan S, Li Y, Sun H et al (2014) Interpreting the public sentiment variations on twitter. IEEE Trans Knowl Data Eng 26(5):1158–1170
Yuanping Z, Mingzhu T, Jia Y (2007) Rocchio text classification based on ontology. In: 7th international conference of Chinese computing (ICCC 2007), China, 2007, pp 266–271
Kwon O-W, Lee J-H (2003) Text categorization based on k-nearest neighbor approach for Web site classification. Inf Process Manag 39(1):25–44
Rätsch G, Onoda T, Müller K-R (2001) Soft margins for AdaBoost. Mach Learn 42(3):287–320
Javed I, Afzal H, Majeed A et al (2014) Towards creation of linguistic resources for bilingual sentiment analysis of twitter data. In: 19th international conference on applications of natural language to information systems, Montpellier, France, pp 232–236
Mikolov T (2012) Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology
Crossley SA, Greenfield J, McNamara DS (2008) Assessing text readability using cognitively based indices. Tesol Q 42(3):475–493
Kanungo T, Orr D (2009) Predicting the readability of short web summaries. In: Proceedings of the second ACM international conference on web search and data mining, NY, USA, pp 202–211
Ganiz MC, George C, Pottenger WM (2011) Higher order Naive Bayes: a novel non-IID approach to text classification. IEEE Trans Knowl Data Eng 23(7):1022–1034
Miranda V, Jaco D, Henk F (2012) Ethnic concentration in the neighbourhood and majority and minority language: a study of first and second-generation immigrants. Soc Sci Res 41(3):555–569
Abuom TO, Roelien B (2012) Characteristics of Swahili–English bilingual agrammatic spontaneous speech and the consequences for understanding agrammatic aphasia. J Neurolinguist 15(5):885–893
González-Ortega D, Díaz-Pernas FJ, Martínez-Zarzuela M, Antón-Rodríguez M, Díez-Higuera JF, Boto-Giralda D (2010) Real-time hands, face and facial features detection and tracking: application to cognitive rehabilitation tests monitoring. J Netw Comput Appl 33(4):447–466
Schapire RE, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39(2–3):135–168
Gambina A, Szczureka E, Dutkowskia J, Bakunc M, Dadlez M (2009) Classification of peptide mass fingerprint data by novel no-regret boosting method. Comput Biol Med 39(5):460–473
Schapire RE (2005) Boosting with prior knowledge for call classification. IEEE Trans Speech Audio Process 13(2):174–181. doi:10.1109/TSA.2004.840937
Zhu J, Rosset S, Zou H, Hastie T (2006) Multi-class AdaBoost. Stanford Education http://www.stanford.edu/~hastie/Papers/samme.pdf.2006
Masnadi-Shirazi H, Vasconcelos N (2007) Asymmetric boosting. In: Proceedings of the 24th international conference on machine learning (ICML ‘07), NY, USA, pp 609–616
Hach F, Numanagić I, Alkan C, Sahinalp SC (2012) SCALCE: boosting sequence compression algorithms using locally consistent encoding. Bioinformatics 28(23):3051–3057
Ting KM, Zheng ZJ (2003) A study of AdaBoost with naive Bayesian classifiers: weakness and improvement. Comput Intell 19(2):186–200
Yijun S, Sinisa T, Jian L (2006) Reducing the overfitting of AdaBoost by controlling its data distribution skewness. Int J Pattern Recogn Artif Intell 20(7):1093–1116
Song E, Huang D, Ma G (2011) Semi-supervised multi-class Adaboost by exploiting unlabeled data. Expert Syst Appl 38(6):6720–6726
Uguz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl Based Syst 24(7):1024–1032
Larson P, Diaconu C, Zwilling MJ, Freedman CS (2011) Optimistic multi-version concurrency control system used for controlling concurrently executing transactions, assigns created version of data records of data store as two timestamps indicating lifetime of version. US Patent US 2011153566-A1 (online). http://www.patentlens.net/patentlens/patent/US_201153566-A1
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
La, L., Wang, N. & Zhou, Dp. Improving reading comprehension step by step using Online-Boost text readability classification system. Neural Comput & Applic 26, 929–939 (2015). https://doi.org/10.1007/s00521-014-1770-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-014-1770-2