Skip to main content

Supercalifragilisticexpialidocious: Why Using the “Right” Readability Formula in Children’s Web Search Matters

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2022)

Abstract

Readability is a core component of information retrieval (IR) tools as the complexity of a resource directly affects its relevance: a resource is only of use if the user can comprehend it. Even so, the link between readability and IR is often overlooked. As a step towards advancing knowledge on the influence of readability on IR, we focus on Web search for children. We explore how traditional formulas–which are simple, efficient, and portable–fare when applied to estimating the readability of Web resources for children written in English. We then present a formula well-suited for readability estimation of child-friendly Web resources. Lastly, we empirically show that readability can sway children’s information access. Outcomes from this work reveal that: (i) for Web resources targeting children, a simple formula suffices as long as it considers contemporary terminology and audience requirements, and (ii) instead of turning to Flesch-Kincaid–a popular formula–the use of the “right” formula can shape Web search tools to best serve children. The work we present herein builds on three pillars: Audience, Application, and Expertise. It serves as a blueprint to place readability estimation methods that best apply to and inform IR applications serving varied audiences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Grade levels according to the United States’ educational system.

  2. 2.

    The script used for analysis purposes, along with the Spache-Allen itself can be found at https://github.com/BSU-CAST/ecir22-readability.

  3. 3.

    A set of learning outcomes to inform curriculum for schools in the United States.

  4. 4.

    RAZ uses a 26-letter scale assigned by experts for readability [47]. To enable fair comparison, we map letter labels to grade labels, using RAZ’s conversion table [46].

  5. 5.

    We use KORSCE’s implementation made available by the authors.

  6. 6.

    In Scenario 4, FK’s performance is not unexpected as KORSCE is optimized for FK.

References

  1. https://www.lexile.com/

  2. https://github.com/shivam5992/textstat

  3. https://github.com/cdimascio/py-readability-metrics/blob/master/readability/data/spache_easy.txt

  4. Albright, J., de Guzman, C., Acebo, P., Paiva, D., Faulkner, M., Swanson, J.: Readability of patient education materials: implications for clinical practice. Appl. Nurs. Res. 9(3), 139–143 (1996)

    Article  Google Scholar 

  5. Alharthi, H., Inkpen, D.: Study of linguistic features incorporated in a literary book recommender system. In: ACM/SIGAPP SAC, pp. 1027–1034 (2019)

    Google Scholar 

  6. Aliannejadi, M., Zamani, H., Crestani, F., Croft, W.B.: Asking clarifying questions in open-domain information-seeking conversations. In: ACM SIGIR, pp. 475–484 (2019)

    Google Scholar 

  7. Allan, J., Croft, B., Moffat, A., Sanderson, M.: Frontiers, challenges, and opportunities for information retrieval: report from SWIRL 2012. In: ACM SIGIR Forum, vol. 46, pp. 2–32 (2012)

    Google Scholar 

  8. Allen, G., et al.: Engage!: co-designing search engine result pages to foster interactions. In: ACM IDC, pp. 583–587 (2021)

    Google Scholar 

  9. Allen, G., Wright, K.L., Fails, J.A., Kennington, C., Pera, M.S.: Casting a net: supporting teachers with search technology. arXiv preprint arXiv:2105.03456 (2021)

  10. Amendum, S.J., Conradi, K., Hiebert, E.: Does text complexity matter in the elementary grades? A research synthesis of text difficulty and elementary students’ reading fluency and comprehension. Educ. Psychol. Rev. 30(1), 121–151 (2018)

    Article  Google Scholar 

  11. Amendum, S.J., Conradi, K., Liebfreund, M.D.: The push for more challenging texts: an analysis of early readers’ rate, accuracy, and comprehension. Read. Psychol. 37(4), 570–600 (2016)

    Article  Google Scholar 

  12. Anderson, J.: Lix and Rix: variations on a little-known readability index. J. Read. 26(6), 490–496 (1983)

    Google Scholar 

  13. Antunes, H., Lopes, C.T.: Readability of web content. In: CISTI, pp. 1–4 (2019)

    Google Scholar 

  14. Anuyah, O., Milton, A., Green, M., Pera, M.S.: An empirical analysis of search engines’ response to web search queries associated with the classroom setting. Aslib J. Inf. Manage. 72(1), 88–111 (2020)

    Article  Google Scholar 

  15. Begeny, J.C., Greene, D.J.: Can readability formulas be used to successfully gauge difficulty of reading materials? Psychol. Sch. 51(2), 198–215 (2014)

    Article  Google Scholar 

  16. Benjamin, R.G.: Reconstructing readability: recent developments and recommendations in the analysis of text difficulty. Educ. Psychol. Rev. 24(1), 63–88 (2012)

    Article  Google Scholar 

  17. Bilal, D.: Comparing Google’s readability of search results to the Flesch readability formulae: a preliminary analysis on children’s search queries. Am. Soc. Inf. Sci. Technol. 50(1), 1–9 (2013)

    Google Scholar 

  18. Bilal, D., Huang, L.-M.: Readability and word complexity of SERPs snippets and web pages on children’s search queries: Google vs Bing. Aslib J. Inf. Manage. 71(2), 241–259 (2019)

    Article  Google Scholar 

  19. Bilal, D., Kirby, J.: Differences and similarities in information seeking: children and adults as web users. IPM 38(5), 649–670 (2002)

    MATH  Google Scholar 

  20. Björnsson, C.H.: Läsbarhet: hur skall man som författare nå fram till läsarna? Bokförlaget Liber (1968)

    Google Scholar 

  21. Bruce, B., Rubin, A., Starr, K.: Why readability formulas fail. IEEE Trans. Prof. Commun. 1, 50–52 (1981)

    Article  Google Scholar 

  22. Chall, J.S., Dale, E.: Readability Revisited: The New Dale-Chall Readability Formula. Brookline Books (1995)

    Google Scholar 

  23. Chatterjee, P., Damevski, K., Kraft, N.A., Pollock, L.: Automatically identifying the quality of developer chats for post hoc use. ACM TOSEM 30(4), 1–28 (2021)

    Article  Google Scholar 

  24. Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283 (1975)

    Article  Google Scholar 

  25. Collins-Thompson, K., Bennett, P.N., White, R.W., De La Chica, S., Sontag, D.: Personalizing web search results by reading level. In: ACM CIKM, pp. 403–412 (2011)

    Google Scholar 

  26. Crossley, S.A., Skalicky, S., Dascalu, M.: Moving beyond classic readability formulas: new methods and new models. J. Res. Read. 42(3–4), 541–561 (2019)

    Article  Google Scholar 

  27. Dale, E., Chall, J.S.: A formula for predicting readability: instructions. Educ. Res. Bull. 27, 37–54 (1948)

    Google Scholar 

  28. D’Alessandro, D.M., Kingsley, P., Johnson-West, J.: The readability of pediatric patient education materials on the world wide web. Arch. Pediatr. Adolesc. Med. 155(7), 807–812 (2001)

    Article  Google Scholar 

  29. Dalip, D.H., Gonçalves, M.A., Cristo, M., Calado, P.: Exploiting user feedback to learn to rank answers in q&a forums: a case study with stack overflow. In: ACM SIGIR, pp. 543–552 (2013)

    Google Scholar 

  30. Dragovic, N., Madrazo Azpiazu, I., Pera, M.S.: “Is Sven Seven?” A search intent module for children. In: ACM SIGIR, pp. 885–888 (2016)

    Google Scholar 

  31. DuBay, W.H.: Smart Language: Readers, Readability, and the Grading of Text (2007)

    Google Scholar 

  32. Eickhoff, C., et al.: EmSe: initial evaluation of a child-friendly medical search system. In: IIiX, pp. 282–285 (2012)

    Google Scholar 

  33. Eickhoff, C., de Vries, A.P., Collins-Thompson, K.: Copulas for information retrieval. In: ACM SIGIR, pp. 663–672 (2013)

    Google Scholar 

  34. Ekstrand, M.D., Wright, K.L., Pera, M.S.: Enhancing classroom instruction with online news. Aslib J. Inf. Manage. 72(5), 725–744 (2020)

    Article  Google Scholar 

  35. El-Haj, M., Rayson, P.: Osman–a novel Arabic readability metric. In: LREC, pp. 250–255 (2016)

    Google Scholar 

  36. Ermakova, L., et al.: Text simplification for scientific information access. In: ECIR (2021)

    Google Scholar 

  37. François, T., Miltsakaki, E.: Do NLP and machine learning improve traditional readability formulas? In: 1st Workshop on Predicting and Improving Text Readability for Target Reader Populations, pp. 49–57 (2012)

    Google Scholar 

  38. Garcia-Febo, L., Hustad, A., Rösch, H., Sturges, P., Vallotton, A.: IFLA code of ethics for librarians and other information workers. https://www.ifla.org/publications/ifla-code-of-ethics-for-librarians-and-other-information-workers-short-version-/

  39. Gonzalez-Dios, I., Aranzabe, M.J., de Ilarraza, A.D., Salaberri, H.: Simple or complex? Assessing the readability of Basque Texts. In: COLING, pp. 334–344 (2014)

    Google Scholar 

  40. Gunning, R.: The fog index after twenty years. J. Bus. Commun. 6(2), 3–13 (1969)

    Article  Google Scholar 

  41. Gwizdka, J., Bilal, D.: Analysis of children’s queries and click behavior on ranked results and their thought processes in Google search. In: CHIIR, pp. 377–380 (2017)

    Google Scholar 

  42. Common Core Stat Standards Initiative: Appendix B: text exemplars and sample performance tasks (2020). http://www.corestandards.org/assets/Appendix_B.pdf

  43. Kincaid, J.P., Fishburne, R.P., Jr., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)

    Google Scholar 

  44. Kruskal, W.H., Wallis, W.A.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47(260), 583–621 (1952)

    Article  Google Scholar 

  45. Kuperman, V., Stadthagen-Gonzalez, H., Brysbaert, M.: Age-of-acquisition ratings for 30,000 English words. Behav. Res. Meth. 44(4), 978–990 (2012)

    Article  Google Scholar 

  46. Lazel, I.: Level correlation chart (2021). https://www.readinga-z.com/learninga-z-levels/level-correlation-chart/. Accessed 18 Jan 2021

  47. Lazel, I.: Reading A-Z: the online reading program with downloadable books to print and assemble (2021). https://www.readinga-z.com/. Accessed 18 Jan 2021

  48. Le, L.T., Shah, C., Choi, E.: Evaluating the quality of educational answers in community question-answering. In: IEEE/ACM JCDL, pp. 129–138 (2016)

    Google Scholar 

  49. Lin, C.Y., Wu, Y.-H., Chen, A.L.P.: Selecting the most helpful answers in online health question answering communities. J. Intell. Inf. Syst. 57(2), 271–293 (2021)

    Article  Google Scholar 

  50. Liu, L., Koutrika, G., Wu, S.: LearningAssistant: a novel learning resource recommendation system. In: IEEE ICDE, pp. 1424–1427 (2015)

    Google Scholar 

  51. Madrazo Azpiazu, I.: Towards multipurpose readability assessment. Master’s thesis, Boise State University (2016). https://scholarworks.boisestate.edu/td/1210/

  52. Madrazo Azpiazu, I., Dragovic, N., Anuyah, O., Pera, M.S.: Looking for the movie Seven or Sven from the movie frozen? A multi-perspective strategy for recommending queries for children. In: ACM CHIIR, pp. 92–101 (2018)

    Google Scholar 

  53. Madrazo Azpiazu, I., Dragovic, N., Pera, M.S.: Finding, understanding and learning: making information discovery tasks useful for children and teachers. In: SAL Workshop co-located with ACM SIGIR (2016)

    Google Scholar 

  54. Madrazo Azpiazu, I., Dragovic, N., Pera, M.S., Fails, J.A.: Online searching and learning: YUM and other search tools for children and teachers. Inf. Retr. J. 20(5), 524–545 (2017)

    Article  Google Scholar 

  55. Madrazo Azpiazu, I., Pera, M.S.: Multiattentive recurrent neural network architecture for multilingual readability assessment. TACL 7, 421–436 (2019)

    Article  Google Scholar 

  56. Madrazo Azpiazu, I., Pera, M.S.: An analysis of transfer learning methods for multilingual readability assessment. In: Adjunct Publication of the 28th ACM UMAP, pp. 95–100 (2020)

    Google Scholar 

  57. Mc Laughlin, G.H.: Smog grading-a new readability formula. J. Read. 12(8), 639–646 (1969)

    Google Scholar 

  58. Meng, C., Chen, M., Mao, J., Neville, J.: ReadNet: a hierarchical transformer framework for web article readability analysis. In: Jose, J.M., et al. (eds.) Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I, pp. 33–49. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_3

    Chapter  Google Scholar 

  59. Milton, A., Allen, G., Pera, M.S.: To infinity and beyond! Accessibility is the future for kids’ search engines. arXiv preprint arXiv:2106.07813 (2021)

  60. Milton, A., Anuya, O., Spear, L., Wright, K.L., Pera, M.S.: A ranking strategy to promote resources supporting the classroom environment. In: IEEE/WIC/ACM WI-IAT, pp. 121–128 (2020)

    Google Scholar 

  61. Miltsakaki, E., Troutt, A.: Read-X: automatic evaluation of reading difficulty of web text. In: E-Learn, pp. 7280–7286. AACE (2007)

    Google Scholar 

  62. Mohammadi, H., Khasteh, S.H.: Text as environment: a deep reinforcement learning text readability assessment model. arXiv preprint arXiv:1912.05957 (2019)

  63. Newsela: Newsela article corpos (2016). https://newsela.com/data

  64. Ngada, O., Haskins, B.: Fake news detection using content-based features and machine learning. In: IEEE CSDE, pp. 1–6 (2020)

    Google Scholar 

  65. Otto, C., et al.: Predicting knowledge gain during web search based on multimedia resource consumption. In: AIED, pp. 318–330 (2021)

    Google Scholar 

  66. Pera, M.S., Ng, Y.K.: Automating readers’ advisory to make book recommendations for k-12 readers. In: ACM RecSys, pp. 9–16 (2014)

    Google Scholar 

  67. Ramiro, C., Srinivasan, M., Malt, B.C., Xu, Y.: Algorithms in the historical emergence of word senses. Nat. Acad. Sci. 115(10), 2323–2328 (2018)

    Article  Google Scholar 

  68. Reed, D.K., Kershaw-Herrera, S.: An examination of text complexity as characterized by readability and cohesion. J. Exp. Educ. 84(1), 75–97 (2016)

    Article  Google Scholar 

  69. Roy, N., Torre, M.V., Gadiraju, U., Maxwell, D., Hauff, C.: Note the highlight: incorporating active reading tools in a search as learning environment. In: ACM CHIIR, pp. 229–238 (2021)

    Google Scholar 

  70. Saptono, R., Mine, T.: Time-based sampling methods for detecting helpful reviews. In: IEEE/WIC/ACM WI-IAT, pp. 508–513 (2020)

    Google Scholar 

  71. Spache, G.D.: The Spache readability formula. In: Good Reading for Poor Readers, pp. 195–207 (1974)

    Google Scholar 

  72. Spaulding, S.: A Spanish readability formula. Mod. Lang. J. 40(8), 433–441 (1956)

    Article  Google Scholar 

  73. Szabo, S., Sinclair, B.: STAAR reading passages: the readability is too high. Schooling 3(1), 1–14 (2012)

    Google Scholar 

  74. Szabo, S., Sinclair, B.B.: Readability of the STAAR test is still misaligned. Schooling 10(1), 1–12 (2019)

    Google Scholar 

  75. Tahir, M., et al.: Evaluation of quality and readability of online health information on high blood pressure using DISCERN and Flesch-Kincaid tools. Appl. Sci. 10(9), 3214 (2020)

    Article  Google Scholar 

  76. Taranova, A., Braschler, M.: Textual complexity as an indicator of document relevance. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 410–417. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_42

    Chapter  Google Scholar 

  77. Vajjala, S., Meurers, D.: On improving the accuracy of readability classification using insights from second language acquisition. In: 7th Workshop on Building Educational Applications using NLP, pp. 163–173 (2012)

    Google Scholar 

  78. Vajjala, S., Meurers, D.: On the applicability of readability models to web texts. In: 2nd Workshop on Predicting and Improving Text Readability for Target Reader Populations, pp. 59–68 (2013)

    Google Scholar 

  79. Wang, H.X.: Developing and testing readability measurements for second language learners. Ph.D. thesis, Queensland University of Technology (2016)

    Google Scholar 

  80. Westervelf, T.: Wizenoze search white paper (2021). https://cdn.theewf.org/uploads/pdf/Wizenoze-white-paper.pdf

  81. Wizenoze: Wizenoze readability index (2021). http://www.wizenoze.com

  82. Wojciechowski, A., Gorzynski, K.: A method for measuring similarity of books: a step towards an objective recommender system for readers. In: Vetulani, Z., Uszkoreit, H., Kubis, M. (eds.) LTC 2013. LNCS (LNAI), vol. 9561, pp. 161–174. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43808-5_13

    Chapter  Google Scholar 

  83. Wong, K., Levi, J.R.: Readability of pediatric otolaryngology information by children’s hospitals and academic institutions. Laryngoscope 127(4), E138–E144 (2017)

    Article  Google Scholar 

  84. Xia, M., Kochmar, E., Briscoe, T.: Text readability assessment for second language learners. arXiv preprint arXiv:1906.07580 (2019)

  85. Yu, C.H., Miller, R.C.: Enhancing web page readability for non-native readers. In: CHI 2010, pp. 2523–2532 (2010)

    Google Scholar 

Download references

Acknowledgments

Work partially funded by NSF Award #1763649. The authors would like to thank Dr. Ion Madrazo Azpiazu and Dr. Michael D. Ekstrand for their valuable feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Garrett Allen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Allen, G., Milton, A., Wright, K.L., Fails, J., Kennington, C., Pera, M.S. (2022). Supercalifragilisticexpialidocious: Why Using the “Right” Readability Formula in Children’s Web Search Matters. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13185. Springer, Cham. https://doi.org/10.1007/978-3-030-99736-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99736-6_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99735-9

  • Online ISBN: 978-3-030-99736-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics