Is Wikipedia Easy to Understand?: A Study Beyond Conventional Readability Metrics

Setia, Simran; Iyengar, S. R. S.; Verma, Amit Arjun; Dubey, Neeru

doi:10.1007/978-3-030-88113-9_14

Simran Setia⁹,
S. R. S. Iyengar⁹,
Amit Arjun Verma⁹ &
…
Neeru Dubey⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1463))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1229 Accesses
2 Citations
1 Altmetric

Abstract

Wikipedia has emerged to be one of the most prominent sources of information available on the Internet today. It provides a collaborative platform for editors to edit and share their information, making Wikipedia a valuable source of information. The Wikipedia articles have been duly studied from an editor’s point of view. But, the analysis of Wikipedia from the reader’s perspective is yet to be studied. Since Wikipedia serves as an encyclopedia of information for its users, its role as an information securing tool must be examined. The readability of a written text plays a major role in imparting the intended comprehension to its readers. Readability is the ease with which a reader can understand the underlying piece of text. In this paper, we study the readability of various Wikipedia articles. Apart from judging the readability of Wikipedia articles against standard readability metrics, we introduce some new parameters related specifically to the comprehension of the text present in Wikipedia articles. These new parameters, combined with standard readability metrics, help classify the Wikipedia articles into comprehensible and non-comprehensible classes through the SVM classification technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Alemi, A.A., Ginsparg, P.: Text segmentation based on semantic word embeddings. arXiv preprint arXiv:1503.05543 (2015)
Antin, J., Cheshire, C.: Readers are not free-riders: reading as a form of participation on Wikipedia. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, pp. 127–130 (2010)
Google Scholar
Beran, R., et al.: Minimum Hellinger distance estimates for parametric models. Ann. Stat. 5(3), 445–463 (1977)
MathSciNet MATH Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4), 543–565 (1995)
Google Scholar
Bryant, S.L., Forte, A., Bruckman, A.: Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In: Proceedings of the 2005 international ACM SIGGROUP Conference on Supporting Group Work, pp. 1–10 (2005)
Google Scholar
Yang, C.C., et al. (eds.): PAISI 2007. LNCS, vol. 4430. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71549-8
Book Google Scholar
Davison, A., Kantor, R.N.: On the failure of readability formulas to define readable texts: a case study from adaptations. Read. Res. Q., 187–209 (1982)
Google Scholar
Gernsbacher, M.A.: Language Comprehension as Structure Building. Psychology Press (2013)
Google Scholar
Graham, N., Hirst, G., Marthi, B.: Segmenting documents by stylistic character. Nat. Lang. Eng. 11(4), 397–416 (2005)
Article Google Scholar
Jatowt, A., Tanaka, K.: Is Wikipedia too difficult? Comparative analysis of readability of Wikipedia, simple Wikipedia and Britannica. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2607–2610 (2012)
Google Scholar
Kate, R.J., et al.: Learning to predict readability using diverse linguistic features. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 546–554. Association for Computational Linguistics (2010)
Google Scholar
Kendeou, P.: A general inference skill. In: Inferences During Reading, pp. 160–181 (2015)
Google Scholar
Kintsch, W.: The role of knowledge in discourse comprehension: a construction-integration model. Psychol. Rev. 95(2), 163 (1988)
Article Google Scholar
Leicht, N.: Given enough eyeballs, all bugs are shallow-a literature review for the use of crowdsourcing in software testing. In: Proceedings of the 51st Hawaii International Conference on System Sciences (2018)
Google Scholar
Leroy, G., Helmreich, S., Cowie, J.R., Miller, T., Zheng, W.: Evaluating online health information: beyond readability formulas. In: AMIA Annual Symposium Proceedings, vol. 2008, p. 394. American Medical Informatics Association (2008)
Google Scholar
Loughran, T., McDonald, B.: Measuring readability in financial disclosures. J. Financ. 69(4), 1643–1671 (2014)
Article Google Scholar
Lucassen, T., Dijkstra, R., Schraagen, J.M.: Readability of Wikipedia. First Monday (2012)
Google Scholar
McCallum, D.R., Peterson, J.L.: Computer-based readability indexes. In: Proceedings of the ACM 1982 Conference, pp. 44–48 (1982)
Google Scholar
Mosenthal, P.B., Kirsch, I.S.: A new measure for assessing document complexity: The PMOSE/IKIRSCH document readability formula. J. Adolesc. Adult Literacy 41(8), 638–657 (1998)
Google Scholar
Myers, J.L., O’Brien, E.J.: Accessing the discourse representation during reading. Discourse Process. 26(2–3), 131–157 (1998)
Article Google Scholar
Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F.Å., Lanamäki, A.: The people’s encyclopedia under the gaze of the sages: a systematic review of scholarly research on Wikipedia. Available at SSRN 2021326 (2012)
Google Scholar
Piscopo, A., Simperl, E.: What we talk about when we talk about Wikidata quality: a literature survey. In: Proceedings of the 15th International Symposium on Open Collaboration, pp. 1–11 (2019)
Google Scholar
Preece, J., Nonnecke, B., Andrews, D.: The top five reasons for lurking: improving community experiences for everyone. Comput. Hum. Behav. 20(2), 201–223 (2004)
Article Google Scholar
Rexha, A., Kröll, M., Ziak, H., Kern, R.: Authorship identification of documents with high content similarity. Scientometrics 115(1), 223–237 (2018). https://doi.org/10.1007/s11192-018-2661-6
Article Google Scholar
Rezgui, A., Crowston, K.: Stigmergic coordination in Wikipedia. In: Proceedings of the 14th International Symposium on Open Collaboration, pp. 1–12 (2018)
Google Scholar
Schmid, H.: TreeTagger-a language independent part-of-speech tagger (1994). http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
Setia, S., Iyengar, S., Verma, A.A.: QWiki: need for QnA & Wiki to Co-exist. In: Proceedings of the 16th International Symposium on Open Collaboration, pp. 1–12 (2020)
Google Scholar
Si, L., Callan, J.: A statistical model for scientific readability. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 574–576 (2001)
Google Scholar
Singer, P., et al.: Why we read Wikipedia. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1591–1600 (2017)
Google Scholar
Swartz, A.: Who writes Wikipedia. Raw Thought 4 (2006)
Google Scholar
Tzeng, Y., Van Den Broek, P., Kendeou, P., Lee, C.: The computational implementation of the landscape model: modeling inferential processes and memory representations of text comprehension. Behav. Res. Methods 37(2), 277–286 (2005)
Article Google Scholar
Wallot, S., O’Brien, B.A., Haussmann, A., Kloos, H., Lyby, M.S.: The role of reading time complexity and reading speed in text comprehension. J. Exp. Psychol. Learn. Mem. Cogn. 40(6), 1745 (2014)
Article Google Scholar
Yan, X., Song, D., Li, X.: Concept-based document readability in domain specific information retrieval. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 540–549 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology, Ropar, India
Simran Setia, S. R. S. Iyengar, Amit Arjun Verma & Neeru Dubey

Authors

Simran Setia
View author publications
You can also search for this author in PubMed Google Scholar
S. R. S. Iyengar
View author publications
You can also search for this author in PubMed Google Scholar
Amit Arjun Verma
View author publications
You can also search for this author in PubMed Google Scholar
Neeru Dubey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simran Setia .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Krystian Wojtkiewicz
VU Amsterdam, Amsterdam, The Netherlands
Jan Treur
University of the West of England, Bristol, UK
Elias Pimenidis
Wrocław University of Science and Technology, Wrocław, Poland
Marcin Maleszka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Setia, S., Iyengar, S.R.S., Verma, A.A., Dubey, N. (2021). Is Wikipedia Easy to Understand?: A Study Beyond Conventional Readability Metrics. In: Wojtkiewicz, K., Treur, J., Pimenidis, E., Maleszka, M. (eds) Advances in Computational Collective Intelligence. ICCCI 2021. Communications in Computer and Information Science, vol 1463. Springer, Cham. https://doi.org/10.1007/978-3-030-88113-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-88113-9_14
Published: 27 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88112-2
Online ISBN: 978-3-030-88113-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics