skip to main content
10.1145/3439231.3439263acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdsaiConference Proceedingsconference-collections
research-article

Is Simple English Wikipedia As Simple And Easy-to-Understand As We Expect It To Be?

Published:09 June 2021Publication History

ABSTRACT

Conceptual complexity of a written text plays an important role in maintaining reader's interest in reading it. Therefore, automatic text simplification systems should, apart from considering lexical and syntactic complexity of a text, also consider the conceptual complexity. In this study, we analyze and compare two widely used English text simplification corpora, one professionally produced (Newsela) and the other collaboratively made by amateurs and enthusiasts (English Wikipedia–Simple English Wikipedia), focusing on 19 conceptual complexity features. The results indicated that simplification operations made during the production of Simple English Wikipedia in many cases do not follow the patterns of the professionally simplified corpora, thus casting doubts on adequacy of using Simple English Wikipedia as training material for automatic text simplification systems.

References

  1. Marcelo Amancio and Lucia Specia. 2014. An Analysis of Crowdsourced Text Simplifications. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR). Association for Computational Linguistics, Gothenburg, Sweden, 123–130. https://doi.org/10.3115/v1/W14-1214.Google ScholarGoogle ScholarCross RefCross Ref
  2. Barbara Arfé, Lucia Mason, and Inmaculada Fajardo. 2017. Simplifying informational text structure for struggling readers. Reading and Writing (24 Oct 2017).Google ScholarGoogle Scholar
  3. William Coster and David Kauchak. 2011. Learning to Simplify Sentences Using Wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL). 1–9.Google ScholarGoogle Scholar
  4. William Coster and David Kauchak. 2011. Simple English Wikipedia: a new text simplification task. In Proceedings of ACL&HLT. 665–669.Google ScholarGoogle Scholar
  5. Dan Feblowitz and David Kauchak. 2013. Sentence Simplification as Tree Transduction. In Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations. 1–10. http://www.aclweb.org/anthology/W13-2901.Google ScholarGoogle Scholar
  6. Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. (2017).Google ScholarGoogle Scholar
  7. Colby Horn, Cathryn Manduca, and David Kauchak. 2014. Learning a Lexical Simplifier Using Wikipedia. In Proceedings of ACL 2014 (Short Papers). 458–463.Google ScholarGoogle ScholarCross RefCross Ref
  8. Ioana Hulpuş, Narumol Prangnawarat, and Conor Hayes. 2015. Path-Based Semantic Relatedness on Linked Data and Its Use to Word and Entity Disambiguation. In the Semantic Web - ISWC 2015. Springer International Publishing, Cham, 442–457.Google ScholarGoogle Scholar
  9. Ioana Hulpus, Sanja Štajner, and Heiner Stuckenschmidt. 2019. A Spreading Activation Framework for Tracking Conceptual Complexity of Texts. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 3878–3887. https://doi.org/10.18653/v1/P19-1377.Google ScholarGoogle ScholarCross RefCross Ref
  10. William Hwang, Hannaneh Hajishirzi, Mari Ostendorf, andWeiWu. 2015. Aligning Sentences from Standard Wikipedia to Simple Wikipedia. In Proceedings of NAACL&HLT, pp. 211–217.Google ScholarGoogle ScholarCross RefCross Ref
  11. David Kauchak. 2013. Improving Text Simplification Language Modeling Using Unsimplified Text Data. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ACL, 1537–1546.Google ScholarGoogle Scholar
  12. W. Kintsch and T. A. van Dijk. 1978. Towards a model of text comprehension and production. Psychological Review 85 (1978), pp. 363–394.Google ScholarGoogle ScholarCross RefCross Ref
  13. D. S. McNamara, A. Graesser, and M. Louwerse. 2012. Sources of text difficulty: Across the ages and genres. Lanham, MD: Rowman & Littlefield Education.Google ScholarGoogle Scholar
  14. Newsela. 2016. Newsela Article Corpus. https://newsela.com/data. Version: 2016-01-29.Google ScholarGoogle Scholar
  15. Sergiu Nisioi, Sanja Štajner, Simone Paolo Ponzetto, and Liviu P. Dinu. 2017. Exploring Neural Text Simplification Models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL). 85–91.Google ScholarGoogle Scholar
  16. Sanja Štajner, Hannah Béchara, and Horacio Saggion. 2015. A Deeper Exploration of the Standard PB-SMT Approach to Text Simplification and its Evaluation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, Beijing, China, 823–828. https://doi.org/10.3115/v1/P15-2135.Google ScholarGoogle Scholar
  17. Sanja Štajner and Ioana Hulpuş. 2018. Automatic Assessment of Conceptual Text Complexity Using Knowledge Graphs. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, 318–330. https://www.aclweb.org/anthology/C18-1027.Google ScholarGoogle Scholar
  18. Sanja Stajner, Sergiu Nisioi, and Ioana Hulpus. 2020. CoCo: A Tool for Automatically Assessing Conceptual Complexity of Texts. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 7179–7186. https://www.aclweb.org/anthology/2020.lrec-1.887.Google ScholarGoogle Scholar
  19. Sanja Štajner, Hannah Bechara, and Horacio Saggion. 2015. A Deeper Exploration of the Standard PB-SMT Approach to Text Simplification and its Evaluation. In Proceedings of ACL&IJCNLP (Volume 2: Short Papers). 823–828.Google ScholarGoogle ScholarCross RefCross Ref
  20. Sanja Štajner and Ioana Hulpus. 2020. When Shallow is Good Enough: Automatic Assessment of Conceptual Text Complexity using Shallow Semantic Features. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language ResourcesAssociation, Marseille, France, 1414–1422. https://www.aclweb.org/anthology/2020.lrec-1.177.Google ScholarGoogle Scholar
  21. Sanja Štajner and Sergiu Nisioi. 2018. A Detailed Evaluation of Neural Sequence-to-Sequence Models for In-domain and Cross-domain Text Simplification. In Proceedings of the 11th Language Resources and Evaluation Conference (LREC).Google ScholarGoogle Scholar
  22. Ralph Weischedel, Eduard Hovy, Mitchell Marcus, Martha Palmer, Robert Belvin, Sameer Pradhan, Lance Ramshaw, and Nianwen Xue. 2011. OntoNotes: A Large Training Corpus for Enhanced Processing.Google ScholarGoogle Scholar
  23. Simple English Wikipedia. 2020. Instructions for the Authors of Simple English Wikipedia. https://simple.wikipedia.org/wiki/Main_Page.Google ScholarGoogle Scholar
  24. Kristian Woodsend and Mirella Lapata. 2011. Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP). 409–420.Google ScholarGoogle Scholar
  25. Wei Xu, Chris Callison-Burch, and Courtney Napoles. 2015. Problems in Current Text Simplification Research: New Data Can Help. Transactions of the Association for Computational Linguistics (TACL) 3 (2015), 283–297.Google ScholarGoogle ScholarCross RefCross Ref
  26. Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, and Chris Callison-Burch. 2016. Optimizing Statistical Machine Translation for Text Simplification. Transactions of the Association for Computational Linguistics 4 (2016), 401–415.Google ScholarGoogle ScholarCross RefCross Ref
  27. Mark Yatskar, Bo Pang, Cristian Danescu-Niculescu-Mizil, and Lillian Lee. 2010. For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Los Angeles, California) (HLT ’10). Association for Computational Linguistics, Stroudsburg, PA, USA, 365–368. http://dl.acm.org/citation.cfm?id=1857999.1858055Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xingxing Zhang and Mirella Lapata. 2017. Sentence Simplification with Deep Reinforcement Learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 584–594.Google ScholarGoogle ScholarCross RefCross Ref
  29. Sanja Štajner, Richard Evans, Constantin Orasan, and Ruslan Mitkov. 2012. What can readability measures really tell us about text complexity?. In Proceedings of the LREC’12 Workshop: Natural Language Processing for Improving Textual Accessibility (NLP4ITA) (23-25), Luz Rello and Horacio Saggion (Eds.). European Language Resources Association (ELRA), Istanbul, Turkey.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    DSAI '20: Proceedings of the 9th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion
    December 2020
    245 pages
    ISBN:9781450389372
    DOI:10.1145/3439231

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 9 June 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate17of23submissions,74%
  • Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)1

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format