Skip to main content

Is Wikipedia Easy to Understand?: A Study Beyond Conventional Readability Metrics

  • Conference paper
  • First Online:
Advances in Computational Collective Intelligence (ICCCI 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1463))

Included in the following conference series:

Abstract

Wikipedia has emerged to be one of the most prominent sources of information available on the Internet today. It provides a collaborative platform for editors to edit and share their information, making Wikipedia a valuable source of information. The Wikipedia articles have been duly studied from an editor’s point of view. But, the analysis of Wikipedia from the reader’s perspective is yet to be studied. Since Wikipedia serves as an encyclopedia of information for its users, its role as an information securing tool must be examined. The readability of a written text plays a major role in imparting the intended comprehension to its readers. Readability is the ease with which a reader can understand the underlying piece of text. In this paper, we study the readability of various Wikipedia articles. Apart from judging the readability of Wikipedia articles against standard readability metrics, we introduce some new parameters related specifically to the comprehension of the text present in Wikipedia articles. These new parameters, combined with standard readability metrics, help classify the Wikipedia articles into comprehensible and non-comprehensible classes through the SVM classification technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://en.wikipedia.org/wiki/Wikipedia:Statistics.

  2. 2.

    https://tools.wmflabs.org/siteviews.

  3. 3.

    https://tools.wmflabs.org/topviews.

  4. 4.

    https://en.wikipedia.org/wiki/Wikipedia:Content_assessment.

  5. 5.

    https://simple.wikipedia.org/wiki/Main_Page.

References

  1. Alemi, A.A., Ginsparg, P.: Text segmentation based on semantic word embeddings. arXiv preprint arXiv:1503.05543 (2015)

  2. Antin, J., Cheshire, C.: Readers are not free-riders: reading as a form of participation on Wikipedia. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, pp. 127–130 (2010)

    Google Scholar 

  3. Beran, R., et al.: Minimum Hellinger distance estimates for parametric models. Ann. Stat. 5(3), 445–463 (1977)

    MathSciNet  MATH  Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    Google Scholar 

  5. Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4), 543–565 (1995)

    Google Scholar 

  6. Bryant, S.L., Forte, A., Bruckman, A.: Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In: Proceedings of the 2005 international ACM SIGGROUP Conference on Supporting Group Work, pp. 1–10 (2005)

    Google Scholar 

  7. Yang, C.C., et al. (eds.): PAISI 2007. LNCS, vol. 4430. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71549-8

    Book  Google Scholar 

  8. Davison, A., Kantor, R.N.: On the failure of readability formulas to define readable texts: a case study from adaptations. Read. Res. Q., 187–209 (1982)

    Google Scholar 

  9. Gernsbacher, M.A.: Language Comprehension as Structure Building. Psychology Press (2013)

    Google Scholar 

  10. Graham, N., Hirst, G., Marthi, B.: Segmenting documents by stylistic character. Nat. Lang. Eng. 11(4), 397–416 (2005)

    Article  Google Scholar 

  11. Jatowt, A., Tanaka, K.: Is Wikipedia too difficult? Comparative analysis of readability of Wikipedia, simple Wikipedia and Britannica. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2607–2610 (2012)

    Google Scholar 

  12. Kate, R.J., et al.: Learning to predict readability using diverse linguistic features. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 546–554. Association for Computational Linguistics (2010)

    Google Scholar 

  13. Kendeou, P.: A general inference skill. In: Inferences During Reading, pp. 160–181 (2015)

    Google Scholar 

  14. Kintsch, W.: The role of knowledge in discourse comprehension: a construction-integration model. Psychol. Rev. 95(2), 163 (1988)

    Article  Google Scholar 

  15. Leicht, N.: Given enough eyeballs, all bugs are shallow-a literature review for the use of crowdsourcing in software testing. In: Proceedings of the 51st Hawaii International Conference on System Sciences (2018)

    Google Scholar 

  16. Leroy, G., Helmreich, S., Cowie, J.R., Miller, T., Zheng, W.: Evaluating online health information: beyond readability formulas. In: AMIA Annual Symposium Proceedings, vol. 2008, p. 394. American Medical Informatics Association (2008)

    Google Scholar 

  17. Loughran, T., McDonald, B.: Measuring readability in financial disclosures. J. Financ. 69(4), 1643–1671 (2014)

    Article  Google Scholar 

  18. Lucassen, T., Dijkstra, R., Schraagen, J.M.: Readability of Wikipedia. First Monday (2012)

    Google Scholar 

  19. McCallum, D.R., Peterson, J.L.: Computer-based readability indexes. In: Proceedings of the ACM 1982 Conference, pp. 44–48 (1982)

    Google Scholar 

  20. Mosenthal, P.B., Kirsch, I.S.: A new measure for assessing document complexity: The PMOSE/IKIRSCH document readability formula. J. Adolesc. Adult Literacy 41(8), 638–657 (1998)

    Google Scholar 

  21. Myers, J.L., O’Brien, E.J.: Accessing the discourse representation during reading. Discourse Process. 26(2–3), 131–157 (1998)

    Article  Google Scholar 

  22. Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F.Å., Lanamäki, A.: The people’s encyclopedia under the gaze of the sages: a systematic review of scholarly research on Wikipedia. Available at SSRN 2021326 (2012)

    Google Scholar 

  23. Piscopo, A., Simperl, E.: What we talk about when we talk about Wikidata quality: a literature survey. In: Proceedings of the 15th International Symposium on Open Collaboration, pp. 1–11 (2019)

    Google Scholar 

  24. Preece, J., Nonnecke, B., Andrews, D.: The top five reasons for lurking: improving community experiences for everyone. Comput. Hum. Behav. 20(2), 201–223 (2004)

    Article  Google Scholar 

  25. Rexha, A., Kröll, M., Ziak, H., Kern, R.: Authorship identification of documents with high content similarity. Scientometrics 115(1), 223–237 (2018). https://doi.org/10.1007/s11192-018-2661-6

    Article  Google Scholar 

  26. Rezgui, A., Crowston, K.: Stigmergic coordination in Wikipedia. In: Proceedings of the 14th International Symposium on Open Collaboration, pp. 1–12 (2018)

    Google Scholar 

  27. Schmid, H.: TreeTagger-a language independent part-of-speech tagger (1994). http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

  28. Setia, S., Iyengar, S., Verma, A.A.: QWiki: need for QnA & Wiki to Co-exist. In: Proceedings of the 16th International Symposium on Open Collaboration, pp. 1–12 (2020)

    Google Scholar 

  29. Si, L., Callan, J.: A statistical model for scientific readability. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 574–576 (2001)

    Google Scholar 

  30. Singer, P., et al.: Why we read Wikipedia. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1591–1600 (2017)

    Google Scholar 

  31. Swartz, A.: Who writes Wikipedia. Raw Thought 4 (2006)

    Google Scholar 

  32. Tzeng, Y., Van Den Broek, P., Kendeou, P., Lee, C.: The computational implementation of the landscape model: modeling inferential processes and memory representations of text comprehension. Behav. Res. Methods 37(2), 277–286 (2005)

    Article  Google Scholar 

  33. Wallot, S., O’Brien, B.A., Haussmann, A., Kloos, H., Lyby, M.S.: The role of reading time complexity and reading speed in text comprehension. J. Exp. Psychol. Learn. Mem. Cogn. 40(6), 1745 (2014)

    Article  Google Scholar 

  34. Yan, X., Song, D., Li, X.: Concept-based document readability in domain specific information retrieval. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 540–549 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simran Setia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Setia, S., Iyengar, S.R.S., Verma, A.A., Dubey, N. (2021). Is Wikipedia Easy to Understand?: A Study Beyond Conventional Readability Metrics. In: Wojtkiewicz, K., Treur, J., Pimenidis, E., Maleszka, M. (eds) Advances in Computational Collective Intelligence. ICCCI 2021. Communications in Computer and Information Science, vol 1463. Springer, Cham. https://doi.org/10.1007/978-3-030-88113-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88113-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88112-2

  • Online ISBN: 978-3-030-88113-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics