Supercalifragilisticexpialidocious: Why Using the “Right” Readability Formula in Children’s Web Search Matters

Allen, Garrett; Milton, Ashlee; Wright, Katherine Landau; Fails, Jerry Alan; Kennington, Casey; Pera, Maria Soledad

doi:10.1007/978-3-030-99736-6_1

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13185))

Included in the following conference series:

European Conference on Information Retrieval

2726 Accesses
4 Citations
5 Altmetric

Abstract

Readability is a core component of information retrieval (IR) tools as the complexity of a resource directly affects its relevance: a resource is only of use if the user can comprehend it. Even so, the link between readability and IR is often overlooked. As a step towards advancing knowledge on the influence of readability on IR, we focus on Web search for children. We explore how traditional formulas–which are simple, efficient, and portable–fare when applied to estimating the readability of Web resources for children written in English. We then present a formula well-suited for readability estimation of child-friendly Web resources. Lastly, we empirically show that readability can sway children’s information access. Outcomes from this work reveal that: (i) for Web resources targeting children, a simple formula suffices as long as it considers contemporary terminology and audience requirements, and (ii) instead of turning to Flesch-Kincaid–a popular formula–the use of the “right” formula can shape Web search tools to best serve children. The work we present herein builds on three pillars: Audience, Application, and Expertise. It serves as a blueprint to place readability estimation methods that best apply to and inform IR applications serving varied audiences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Grade levels according to the United States’ educational system.
2.
The script used for analysis purposes, along with the Spache-Allen itself can be found at https://github.com/BSU-CAST/ecir22-readability.
3.
A set of learning outcomes to inform curriculum for schools in the United States.
4.
RAZ uses a 26-letter scale assigned by experts for readability [47]. To enable fair comparison, we map letter labels to grade labels, using RAZ’s conversion table [46].
5.
We use KORSCE’s implementation made available by the authors.
6.
In Scenario 4, FK’s performance is not unexpected as KORSCE is optimized for FK.

References

https://www.lexile.com/
https://github.com/shivam5992/textstat
https://github.com/cdimascio/py-readability-metrics/blob/master/readability/data/spache_easy.txt
Albright, J., de Guzman, C., Acebo, P., Paiva, D., Faulkner, M., Swanson, J.: Readability of patient education materials: implications for clinical practice. Appl. Nurs. Res. 9(3), 139–143 (1996)
Article Google Scholar
Alharthi, H., Inkpen, D.: Study of linguistic features incorporated in a literary book recommender system. In: ACM/SIGAPP SAC, pp. 1027–1034 (2019)
Google Scholar
Aliannejadi, M., Zamani, H., Crestani, F., Croft, W.B.: Asking clarifying questions in open-domain information-seeking conversations. In: ACM SIGIR, pp. 475–484 (2019)
Google Scholar
Allan, J., Croft, B., Moffat, A., Sanderson, M.: Frontiers, challenges, and opportunities for information retrieval: report from SWIRL 2012. In: ACM SIGIR Forum, vol. 46, pp. 2–32 (2012)
Google Scholar
Allen, G., et al.: Engage!: co-designing search engine result pages to foster interactions. In: ACM IDC, pp. 583–587 (2021)
Google Scholar
Allen, G., Wright, K.L., Fails, J.A., Kennington, C., Pera, M.S.: Casting a net: supporting teachers with search technology. arXiv preprint arXiv:2105.03456 (2021)
Amendum, S.J., Conradi, K., Hiebert, E.: Does text complexity matter in the elementary grades? A research synthesis of text difficulty and elementary students’ reading fluency and comprehension. Educ. Psychol. Rev. 30(1), 121–151 (2018)
Article Google Scholar
Amendum, S.J., Conradi, K., Liebfreund, M.D.: The push for more challenging texts: an analysis of early readers’ rate, accuracy, and comprehension. Read. Psychol. 37(4), 570–600 (2016)
Article Google Scholar
Anderson, J.: Lix and Rix: variations on a little-known readability index. J. Read. 26(6), 490–496 (1983)
Google Scholar
Antunes, H., Lopes, C.T.: Readability of web content. In: CISTI, pp. 1–4 (2019)
Google Scholar
Anuyah, O., Milton, A., Green, M., Pera, M.S.: An empirical analysis of search engines’ response to web search queries associated with the classroom setting. Aslib J. Inf. Manage. 72(1), 88–111 (2020)
Article Google Scholar
Begeny, J.C., Greene, D.J.: Can readability formulas be used to successfully gauge difficulty of reading materials? Psychol. Sch. 51(2), 198–215 (2014)
Article Google Scholar
Benjamin, R.G.: Reconstructing readability: recent developments and recommendations in the analysis of text difficulty. Educ. Psychol. Rev. 24(1), 63–88 (2012)
Article Google Scholar
Bilal, D.: Comparing Google’s readability of search results to the Flesch readability formulae: a preliminary analysis on children’s search queries. Am. Soc. Inf. Sci. Technol. 50(1), 1–9 (2013)
Google Scholar
Bilal, D., Huang, L.-M.: Readability and word complexity of SERPs snippets and web pages on children’s search queries: Google vs Bing. Aslib J. Inf. Manage. 71(2), 241–259 (2019)
Article Google Scholar
Bilal, D., Kirby, J.: Differences and similarities in information seeking: children and adults as web users. IPM 38(5), 649–670 (2002)
MATH Google Scholar
Björnsson, C.H.: Läsbarhet: hur skall man som författare nå fram till läsarna? Bokförlaget Liber (1968)
Google Scholar
Bruce, B., Rubin, A., Starr, K.: Why readability formulas fail. IEEE Trans. Prof. Commun. 1, 50–52 (1981)
Article Google Scholar
Chall, J.S., Dale, E.: Readability Revisited: The New Dale-Chall Readability Formula. Brookline Books (1995)
Google Scholar
Chatterjee, P., Damevski, K., Kraft, N.A., Pollock, L.: Automatically identifying the quality of developer chats for post hoc use. ACM TOSEM 30(4), 1–28 (2021)
Article Google Scholar
Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283 (1975)
Article Google Scholar
Collins-Thompson, K., Bennett, P.N., White, R.W., De La Chica, S., Sontag, D.: Personalizing web search results by reading level. In: ACM CIKM, pp. 403–412 (2011)
Google Scholar
Crossley, S.A., Skalicky, S., Dascalu, M.: Moving beyond classic readability formulas: new methods and new models. J. Res. Read. 42(3–4), 541–561 (2019)
Article Google Scholar
Dale, E., Chall, J.S.: A formula for predicting readability: instructions. Educ. Res. Bull. 27, 37–54 (1948)
Google Scholar
D’Alessandro, D.M., Kingsley, P., Johnson-West, J.: The readability of pediatric patient education materials on the world wide web. Arch. Pediatr. Adolesc. Med. 155(7), 807–812 (2001)
Article Google Scholar
Dalip, D.H., Gonçalves, M.A., Cristo, M., Calado, P.: Exploiting user feedback to learn to rank answers in q&a forums: a case study with stack overflow. In: ACM SIGIR, pp. 543–552 (2013)
Google Scholar
Dragovic, N., Madrazo Azpiazu, I., Pera, M.S.: “Is Sven Seven?” A search intent module for children. In: ACM SIGIR, pp. 885–888 (2016)
Google Scholar
DuBay, W.H.: Smart Language: Readers, Readability, and the Grading of Text (2007)
Google Scholar
Eickhoff, C., et al.: EmSe: initial evaluation of a child-friendly medical search system. In: IIiX, pp. 282–285 (2012)
Google Scholar
Eickhoff, C., de Vries, A.P., Collins-Thompson, K.: Copulas for information retrieval. In: ACM SIGIR, pp. 663–672 (2013)
Google Scholar
Ekstrand, M.D., Wright, K.L., Pera, M.S.: Enhancing classroom instruction with online news. Aslib J. Inf. Manage. 72(5), 725–744 (2020)
Article Google Scholar
El-Haj, M., Rayson, P.: Osman–a novel Arabic readability metric. In: LREC, pp. 250–255 (2016)
Google Scholar
Ermakova, L., et al.: Text simplification for scientific information access. In: ECIR (2021)
Google Scholar
François, T., Miltsakaki, E.: Do NLP and machine learning improve traditional readability formulas? In: 1st Workshop on Predicting and Improving Text Readability for Target Reader Populations, pp. 49–57 (2012)
Google Scholar
Garcia-Febo, L., Hustad, A., Rösch, H., Sturges, P., Vallotton, A.: IFLA code of ethics for librarians and other information workers. https://www.ifla.org/publications/ifla-code-of-ethics-for-librarians-and-other-information-workers-short-version-/
Gonzalez-Dios, I., Aranzabe, M.J., de Ilarraza, A.D., Salaberri, H.: Simple or complex? Assessing the readability of Basque Texts. In: COLING, pp. 334–344 (2014)
Google Scholar
Gunning, R.: The fog index after twenty years. J. Bus. Commun. 6(2), 3–13 (1969)
Article Google Scholar
Gwizdka, J., Bilal, D.: Analysis of children’s queries and click behavior on ranked results and their thought processes in Google search. In: CHIIR, pp. 377–380 (2017)
Google Scholar
Common Core Stat Standards Initiative: Appendix B: text exemplars and sample performance tasks (2020). http://www.corestandards.org/assets/Appendix_B.pdf
Kincaid, J.P., Fishburne, R.P., Jr., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)
Google Scholar
Kruskal, W.H., Wallis, W.A.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47(260), 583–621 (1952)
Article Google Scholar
Kuperman, V., Stadthagen-Gonzalez, H., Brysbaert, M.: Age-of-acquisition ratings for 30,000 English words. Behav. Res. Meth. 44(4), 978–990 (2012)
Article Google Scholar
Lazel, I.: Level correlation chart (2021). https://www.readinga-z.com/learninga-z-levels/level-correlation-chart/. Accessed 18 Jan 2021
Lazel, I.: Reading A-Z: the online reading program with downloadable books to print and assemble (2021). https://www.readinga-z.com/. Accessed 18 Jan 2021
Le, L.T., Shah, C., Choi, E.: Evaluating the quality of educational answers in community question-answering. In: IEEE/ACM JCDL, pp. 129–138 (2016)
Google Scholar
Lin, C.Y., Wu, Y.-H., Chen, A.L.P.: Selecting the most helpful answers in online health question answering communities. J. Intell. Inf. Syst. 57(2), 271–293 (2021)
Article Google Scholar
Liu, L., Koutrika, G., Wu, S.: LearningAssistant: a novel learning resource recommendation system. In: IEEE ICDE, pp. 1424–1427 (2015)
Google Scholar
Madrazo Azpiazu, I.: Towards multipurpose readability assessment. Master’s thesis, Boise State University (2016). https://scholarworks.boisestate.edu/td/1210/
Madrazo Azpiazu, I., Dragovic, N., Anuyah, O., Pera, M.S.: Looking for the movie Seven or Sven from the movie frozen? A multi-perspective strategy for recommending queries for children. In: ACM CHIIR, pp. 92–101 (2018)
Google Scholar
Madrazo Azpiazu, I., Dragovic, N., Pera, M.S.: Finding, understanding and learning: making information discovery tasks useful for children and teachers. In: SAL Workshop co-located with ACM SIGIR (2016)
Google Scholar
Madrazo Azpiazu, I., Dragovic, N., Pera, M.S., Fails, J.A.: Online searching and learning: YUM and other search tools for children and teachers. Inf. Retr. J. 20(5), 524–545 (2017)
Article Google Scholar
Madrazo Azpiazu, I., Pera, M.S.: Multiattentive recurrent neural network architecture for multilingual readability assessment. TACL 7, 421–436 (2019)
Article Google Scholar
Madrazo Azpiazu, I., Pera, M.S.: An analysis of transfer learning methods for multilingual readability assessment. In: Adjunct Publication of the 28th ACM UMAP, pp. 95–100 (2020)
Google Scholar
Mc Laughlin, G.H.: Smog grading-a new readability formula. J. Read. 12(8), 639–646 (1969)
Google Scholar
Meng, C., Chen, M., Mao, J., Neville, J.: ReadNet: a hierarchical transformer framework for web article readability analysis. In: Jose, J.M., et al. (eds.) Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I, pp. 33–49. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_3
Chapter Google Scholar
Milton, A., Allen, G., Pera, M.S.: To infinity and beyond! Accessibility is the future for kids’ search engines. arXiv preprint arXiv:2106.07813 (2021)
Milton, A., Anuya, O., Spear, L., Wright, K.L., Pera, M.S.: A ranking strategy to promote resources supporting the classroom environment. In: IEEE/WIC/ACM WI-IAT, pp. 121–128 (2020)
Google Scholar
Miltsakaki, E., Troutt, A.: Read-X: automatic evaluation of reading difficulty of web text. In: E-Learn, pp. 7280–7286. AACE (2007)
Google Scholar
Mohammadi, H., Khasteh, S.H.: Text as environment: a deep reinforcement learning text readability assessment model. arXiv preprint arXiv:1912.05957 (2019)
Newsela: Newsela article corpos (2016). https://newsela.com/data
Ngada, O., Haskins, B.: Fake news detection using content-based features and machine learning. In: IEEE CSDE, pp. 1–6 (2020)
Google Scholar
Otto, C., et al.: Predicting knowledge gain during web search based on multimedia resource consumption. In: AIED, pp. 318–330 (2021)
Google Scholar
Pera, M.S., Ng, Y.K.: Automating readers’ advisory to make book recommendations for k-12 readers. In: ACM RecSys, pp. 9–16 (2014)
Google Scholar
Ramiro, C., Srinivasan, M., Malt, B.C., Xu, Y.: Algorithms in the historical emergence of word senses. Nat. Acad. Sci. 115(10), 2323–2328 (2018)
Article Google Scholar
Reed, D.K., Kershaw-Herrera, S.: An examination of text complexity as characterized by readability and cohesion. J. Exp. Educ. 84(1), 75–97 (2016)
Article Google Scholar
Roy, N., Torre, M.V., Gadiraju, U., Maxwell, D., Hauff, C.: Note the highlight: incorporating active reading tools in a search as learning environment. In: ACM CHIIR, pp. 229–238 (2021)
Google Scholar
Saptono, R., Mine, T.: Time-based sampling methods for detecting helpful reviews. In: IEEE/WIC/ACM WI-IAT, pp. 508–513 (2020)
Google Scholar
Spache, G.D.: The Spache readability formula. In: Good Reading for Poor Readers, pp. 195–207 (1974)
Google Scholar
Spaulding, S.: A Spanish readability formula. Mod. Lang. J. 40(8), 433–441 (1956)
Article Google Scholar
Szabo, S., Sinclair, B.: STAAR reading passages: the readability is too high. Schooling 3(1), 1–14 (2012)
Google Scholar
Szabo, S., Sinclair, B.B.: Readability of the STAAR test is still misaligned. Schooling 10(1), 1–12 (2019)
Google Scholar
Tahir, M., et al.: Evaluation of quality and readability of online health information on high blood pressure using DISCERN and Flesch-Kincaid tools. Appl. Sci. 10(9), 3214 (2020)
Article Google Scholar
Taranova, A., Braschler, M.: Textual complexity as an indicator of document relevance. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 410–417. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_42
Chapter Google Scholar
Vajjala, S., Meurers, D.: On improving the accuracy of readability classification using insights from second language acquisition. In: 7th Workshop on Building Educational Applications using NLP, pp. 163–173 (2012)
Google Scholar
Vajjala, S., Meurers, D.: On the applicability of readability models to web texts. In: 2nd Workshop on Predicting and Improving Text Readability for Target Reader Populations, pp. 59–68 (2013)
Google Scholar
Wang, H.X.: Developing and testing readability measurements for second language learners. Ph.D. thesis, Queensland University of Technology (2016)
Google Scholar
Westervelf, T.: Wizenoze search white paper (2021). https://cdn.theewf.org/uploads/pdf/Wizenoze-white-paper.pdf
Wizenoze: Wizenoze readability index (2021). http://www.wizenoze.com
Wojciechowski, A., Gorzynski, K.: A method for measuring similarity of books: a step towards an objective recommender system for readers. In: Vetulani, Z., Uszkoreit, H., Kubis, M. (eds.) LTC 2013. LNCS (LNAI), vol. 9561, pp. 161–174. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43808-5_13
Chapter Google Scholar
Wong, K., Levi, J.R.: Readability of pediatric otolaryngology information by children’s hospitals and academic institutions. Laryngoscope 127(4), E138–E144 (2017)
Article Google Scholar
Xia, M., Kochmar, E., Briscoe, T.: Text readability assessment for second language learners. arXiv preprint arXiv:1906.07580 (2019)
Yu, C.H., Miller, R.C.: Enhancing web page readability for non-native readers. In: CHI 2010, pp. 2523–2532 (2010)
Google Scholar

Download references

Acknowledgments

Work partially funded by NSF Award #1763649. The authors would like to thank Dr. Ion Madrazo Azpiazu and Dr. Michael D. Ekstrand for their valuable feedback.

Author information

Authors and Affiliations

Department of Computer Science, Boise State University, Boise, ID, USA
Garrett Allen, Jerry Alan Fails, Casey Kennington & Maria Soledad Pera
Department of Literacy, Language and Culture, Boise State University, Boise, ID, USA
Katherine Landau Wright
University of Minnesota, Minneapolis, MN, 55455, USA
Ashlee Milton

Authors

Garrett Allen
View author publications
You can also search for this author in PubMed Google Scholar
Ashlee Milton
View author publications
You can also search for this author in PubMed Google Scholar
Katherine Landau Wright
View author publications
You can also search for this author in PubMed Google Scholar
Jerry Alan Fails
View author publications
You can also search for this author in PubMed Google Scholar
Casey Kennington
View author publications
You can also search for this author in PubMed Google Scholar
Maria Soledad Pera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Garrett Allen .

Editor information

Editors and Affiliations

Martin Luther University Halle-Wittenberg, Halle, Germany
Matthias Hagen
Leiden University, Leiden, The Netherlands
Suzan Verberne
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Duisburg-Essen, Essen, Germany
Christin Seifert
University of Stavanger, Stavanger, Norway
Krisztian Balog
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Stavanger, Stavanger, Norway
Vinay Setty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Allen, G., Milton, A., Wright, K.L., Fails, J., Kennington, C., Pera, M.S. (2022). Supercalifragilisticexpialidocious: Why Using the “Right” Readability Formula in Children’s Web Search Matters. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13185. Springer, Cham. https://doi.org/10.1007/978-3-030-99736-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-99736-6_1
Published: 05 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99735-9
Online ISBN: 978-3-030-99736-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Supercalifragilisticexpialidocious: Why Using the “Right” Readability Formula in Children’s Web Search Matters