Abstract
Wikipedia has emerged to be one of the most prominent sources of information available on the Internet today. It provides a collaborative platform for editors to edit and share their information, making Wikipedia a valuable source of information. The Wikipedia articles have been duly studied from an editor’s point of view. But, the analysis of Wikipedia from the reader’s perspective is yet to be studied. Since Wikipedia serves as an encyclopedia of information for its users, its role as an information securing tool must be examined. The readability of a written text plays a major role in imparting the intended comprehension to its readers. Readability is the ease with which a reader can understand the underlying piece of text. In this paper, we study the readability of various Wikipedia articles. Apart from judging the readability of Wikipedia articles against standard readability metrics, we introduce some new parameters related specifically to the comprehension of the text present in Wikipedia articles. These new parameters, combined with standard readability metrics, help classify the Wikipedia articles into comprehensible and non-comprehensible classes through the SVM classification technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alemi, A.A., Ginsparg, P.: Text segmentation based on semantic word embeddings. arXiv preprint arXiv:1503.05543 (2015)
Antin, J., Cheshire, C.: Readers are not free-riders: reading as a form of participation on Wikipedia. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, pp. 127–130 (2010)
Beran, R., et al.: Minimum Hellinger distance estimates for parametric models. Ann. Stat. 5(3), 445–463 (1977)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4), 543–565 (1995)
Bryant, S.L., Forte, A., Bruckman, A.: Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In: Proceedings of the 2005 international ACM SIGGROUP Conference on Supporting Group Work, pp. 1–10 (2005)
Yang, C.C., et al. (eds.): PAISI 2007. LNCS, vol. 4430. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71549-8
Davison, A., Kantor, R.N.: On the failure of readability formulas to define readable texts: a case study from adaptations. Read. Res. Q., 187–209 (1982)
Gernsbacher, M.A.: Language Comprehension as Structure Building. Psychology Press (2013)
Graham, N., Hirst, G., Marthi, B.: Segmenting documents by stylistic character. Nat. Lang. Eng. 11(4), 397–416 (2005)
Jatowt, A., Tanaka, K.: Is Wikipedia too difficult? Comparative analysis of readability of Wikipedia, simple Wikipedia and Britannica. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2607–2610 (2012)
Kate, R.J., et al.: Learning to predict readability using diverse linguistic features. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 546–554. Association for Computational Linguistics (2010)
Kendeou, P.: A general inference skill. In: Inferences During Reading, pp. 160–181 (2015)
Kintsch, W.: The role of knowledge in discourse comprehension: a construction-integration model. Psychol. Rev. 95(2), 163 (1988)
Leicht, N.: Given enough eyeballs, all bugs are shallow-a literature review for the use of crowdsourcing in software testing. In: Proceedings of the 51st Hawaii International Conference on System Sciences (2018)
Leroy, G., Helmreich, S., Cowie, J.R., Miller, T., Zheng, W.: Evaluating online health information: beyond readability formulas. In: AMIA Annual Symposium Proceedings, vol. 2008, p. 394. American Medical Informatics Association (2008)
Loughran, T., McDonald, B.: Measuring readability in financial disclosures. J. Financ. 69(4), 1643–1671 (2014)
Lucassen, T., Dijkstra, R., Schraagen, J.M.: Readability of Wikipedia. First Monday (2012)
McCallum, D.R., Peterson, J.L.: Computer-based readability indexes. In: Proceedings of the ACM 1982 Conference, pp. 44–48 (1982)
Mosenthal, P.B., Kirsch, I.S.: A new measure for assessing document complexity: The PMOSE/IKIRSCH document readability formula. J. Adolesc. Adult Literacy 41(8), 638–657 (1998)
Myers, J.L., O’Brien, E.J.: Accessing the discourse representation during reading. Discourse Process. 26(2–3), 131–157 (1998)
Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F.Å., Lanamäki, A.: The people’s encyclopedia under the gaze of the sages: a systematic review of scholarly research on Wikipedia. Available at SSRN 2021326 (2012)
Piscopo, A., Simperl, E.: What we talk about when we talk about Wikidata quality: a literature survey. In: Proceedings of the 15th International Symposium on Open Collaboration, pp. 1–11 (2019)
Preece, J., Nonnecke, B., Andrews, D.: The top five reasons for lurking: improving community experiences for everyone. Comput. Hum. Behav. 20(2), 201–223 (2004)
Rexha, A., Kröll, M., Ziak, H., Kern, R.: Authorship identification of documents with high content similarity. Scientometrics 115(1), 223–237 (2018). https://doi.org/10.1007/s11192-018-2661-6
Rezgui, A., Crowston, K.: Stigmergic coordination in Wikipedia. In: Proceedings of the 14th International Symposium on Open Collaboration, pp. 1–12 (2018)
Schmid, H.: TreeTagger-a language independent part-of-speech tagger (1994). http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
Setia, S., Iyengar, S., Verma, A.A.: QWiki: need for QnA & Wiki to Co-exist. In: Proceedings of the 16th International Symposium on Open Collaboration, pp. 1–12 (2020)
Si, L., Callan, J.: A statistical model for scientific readability. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 574–576 (2001)
Singer, P., et al.: Why we read Wikipedia. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1591–1600 (2017)
Swartz, A.: Who writes Wikipedia. Raw Thought 4 (2006)
Tzeng, Y., Van Den Broek, P., Kendeou, P., Lee, C.: The computational implementation of the landscape model: modeling inferential processes and memory representations of text comprehension. Behav. Res. Methods 37(2), 277–286 (2005)
Wallot, S., O’Brien, B.A., Haussmann, A., Kloos, H., Lyby, M.S.: The role of reading time complexity and reading speed in text comprehension. J. Exp. Psychol. Learn. Mem. Cogn. 40(6), 1745 (2014)
Yan, X., Song, D., Li, X.: Concept-based document readability in domain specific information retrieval. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 540–549 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Setia, S., Iyengar, S.R.S., Verma, A.A., Dubey, N. (2021). Is Wikipedia Easy to Understand?: A Study Beyond Conventional Readability Metrics. In: Wojtkiewicz, K., Treur, J., Pimenidis, E., Maleszka, M. (eds) Advances in Computational Collective Intelligence. ICCCI 2021. Communications in Computer and Information Science, vol 1463. Springer, Cham. https://doi.org/10.1007/978-3-030-88113-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-88113-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88112-2
Online ISBN: 978-3-030-88113-9
eBook Packages: Computer ScienceComputer Science (R0)