skip to main content
research-article
Public Access

A Deeper Investigation of the Importance of Wikipedia Links to Search Engine Results

Published:22 April 2021Publication History
Skip Abstract Section

Abstract

A growing body of work has highlighted the important role that Wikipedia's volunteer-created content plays in helping search engines achieve their core goal of addressing the information needs of hundreds of millions of people. In this paper, we report the results of an investigation into the incidence of Wikipedia links in search engine results pages (SERPs). Our results extend prior work by considering three U.S. search engines, simulating both mobile and desktop devices, and using a spatial analysis approach designed to study modern SERPs that are no longer just "ten blue links". We find that Wikipedia links are extremely common in important search contexts, appearing in 67-84% of desktop SERPs for common and trending queries, but less often for medical queries. Furthermore, we observe that Wikipedia links often appear in "Knowledge Panel" SERP elements and are in positions visible to users without scrolling, although Wikipedia appears less often and in less prominent positions on mobile devices. Our findings reinforce the complementary notions that (1) Wikipedia content and research has major impact outside of the Wikipedia domain and (2) powerful technologies like search engines are highly reliant on free content created by volunteers.

References

  1. Imanol Arrieta Ibarra, Leonard Goff, Diego Jiménez Hernández, Jaron Lanier, and E Weyl. 2018. Should We Treat Data as Labor? Moving Beyond "Free." American Economic Association Papers & Proceedings 1, 1 (2018).Google ScholarGoogle Scholar
  2. Michael Barbaro and Tom Zeller Jr. 2006. A Face Is Exposed for AOL Searcher No. 4417749. N.Y. Times (August 2006). Retrieved from https://www.nytimes.com/2006/08/09/technology/09aol.htmlGoogle ScholarGoogle Scholar
  3. Danqi Chen, Weizhu Chen, Haixun Wang, Zheng Chen, and Qiang Yang. 2012. Beyond Ten Blue Links: Enabling User Click Modeling in Federated Web Search. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM '12), ACM, New York, NY, USA, 463--472. DOI:https://doi.org/10.1145/2124295.2124351Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Danny Goodwin. 2012. Wikipedia Appears on Page 1 of Google for 99% of Searches [Study] - Search Engine Watch. Retrieved from https://www.searchenginewatch.com/2012/02/13/wikipedia-appears-on-page-1-of-google-for-99-of-searches-studyGoogle ScholarGoogle Scholar
  5. Danny Goodwin. 2012. Bing, Not Google, Favors Wikipedia More Often in Search Results [Study] - Search Engine Watch. Retrieved from https://www.searchenginewatch.com/2012/03/19/bing-not-google-favors-wikipedia-more-often-in-search-results-studyGoogle ScholarGoogle Scholar
  6. Artem Grotov and Maarten de Rijke. 2016. Online learning to rank for information retrieval: Sigir 2016 tutorial. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 1215--1218.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Aniko Hannak, Piotr Sapiezynski, Arash Molavi Kakhki, Balachander Krishnamurthy, David Lazer, Alan Mislove, and Christo Wilson. 2013. Measuring personalization of web search. In Proceedings of the 22nd international conference on World Wide Web, ACM, 527--538.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Benjamin Mako Hill and Aaron Shaw. 2013. The Wikipedia gender gap revisited: characterizing survey response bias with propensity score estimation. PloS one 8, 6 (2013), e65782.Google ScholarGoogle ScholarCross RefCross Ref
  9. Chris Hughes. 2018. The wealth of our collective data should belong to all of us. The Guardian. Retrieved from https://www.theguardian.com/commentisfree/2018/apr/27/chris-hughes-facebook-google-data-tax-regulationGoogle ScholarGoogle Scholar
  10. Bernard J Jansen, Danielle L Booth, and Amanda Spink. 2007. Determining the user intent of web search engine queries. In Proceedings of the 16th international conference on World Wide Web, ACM, 1149--1150.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Greg Jarboe. 2020. YouTube's Organic Visibility Tops Wikipedia in Google SERPs. Search Engine Journal (January 2020). Retrieved from https://www.searchenginejournal.com/youtube-organic-visibility-google-serps/341419Google ScholarGoogle Scholar
  12. Adrianne Jeffries and Leon Yin. 2020. Google's Top Search Result? Surprise! It's Google. The Markup. Retrieved from https://themarkup.org/google-the-giant/2020/07/28/how-we-analyzed-google-search-results-web-assay-parsing-toolGoogle ScholarGoogle Scholar
  13. Isaac L Johnson, Yilun Lin, Toby Jia-Jun Li, Andrew Hall, Aaron Halfaker, Johannes Schöning, and Brent Hecht. 2016. Not at home on the range: Peer production and the urban/rural divide. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, ACM, 13--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Isaac Johnson, Florian Lemmerich, Diego Sáez-Trumper, Robert West, Markus Strohmaier, and Leila Zia. 2020. Global gender differences in Wikipedia readership. arXiv preprint arXiv:2007.10403 (2020).Google ScholarGoogle Scholar
  15. Chloe Kliman-Silver, Aniko Hannak, David Lazer, Christo Wilson, and Alan Mislove. 2015. Location, location, location: The impact of geolocation on web search personalization. In Proceedings of the 2015 ACM Conference on Internet Measurement Conference, ACM, 121--127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Maya Kosoff. 2018. YouTube Slaps a Feel-Good Band-Aid on Its Fake-News Problem. Vanity Fair (March 2018). Retrieved from https://www.vanityfair.com/news/2018/03/youtube-wikipedia-conspiracy-theory-video-problemGoogle ScholarGoogle Scholar
  17. Jaron Lanier and E Glen Weyl. 2018. A Blueprint for a Better Digital Society. Harvard Business Review (2018).Google ScholarGoogle Scholar
  18. Quoc V Le and Mike Schuster. 2016. A neural network for machine translation, at production scale. Retrieved from https://ai.googleblog.com/2016/09/a-neural-network-for-machine.htmlGoogle ScholarGoogle Scholar
  19. Emma Lurie and Eni Mustafaraj. 2018. Investigating the Effects of Google's Search Engine Result Page in Evaluating the Credibility of Online News Sources. In Proceedings of the 10th ACM Conference on Web Science, 107--116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Connor McMahon, Isaac L Johnson, and Brent Hecht. 2017. The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies. In ICWSM, 142--151.Google ScholarGoogle Scholar
  21. Daniel Oberhaus. 2017. Nearly All of Wikipedia Is Written By Just 1 Percent of Its Editors - Motherboard. Retrieved from https://motherboard.vice.com/en_us/article/7x47bb/wikipedia-editors-elite-diversity-foundationGoogle ScholarGoogle Scholar
  22. Alexandra Papoutsaki, James Laskey, and Jeff Huang. 2017. Searchgazer: Webcam eye tracking for remote studies of web search. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval, 17--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Parr, Ben. 2010. Google Gives $2 Million to Wikipedia's Foundation. Retrieved from https://mashable.com/2010/02/16/google-wikipedia-donationGoogle ScholarGoogle Scholar
  24. Eduardo Porter. 2018. Your Data Is Crucial to a Robotic Age. Shouldn't You Be Paid for It? New York Times. Retrieved from https://www.nytimes.com/2018/03/06/business/economy/user-data-pay.htmlGoogle ScholarGoogle Scholar
  25. Eric A Posner and E Glen Weyl. 2018. Radical Markets: Uprooting Capitalism and Democracy for a Just Society. Princeton University Press.Google ScholarGoogle Scholar
  26. Joseph Reagle and Lauren Rhue. 2011. Gender bias in Wikipedia and Britannica. International Journal of Communication 5, (2011), 21.Google ScholarGoogle Scholar
  27. Miriam Redi, Martin Gerlach, Isaac Johnson, Jonathan Morgan, and Leila Zia. 2020. A Taxonomy of Knowledge Gaps for Wikimedia Projects (First Draft). arXiv preprint arXiv:2008.12314 (2020).Google ScholarGoogle Scholar
  28. Luke Richards. 2018. Why Wikipedia is still visible across Google's SERPs in 2018 - Search Engine Watch. Retrieved from https://www.searchenginewatch.com/2018/11/13/why-wikipedia-is-still-visible-across-googles-serps-in-2018Google ScholarGoogle Scholar
  29. Ronald E Robertson, David Lazer, and Christo Wilson. 2018. Auditing the Personalization and Composition of Politically-Related Search Engine Results Pages. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, International World Wide Web Conferences Steering Committee, 955--965.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Annabel Rothshild, Emma Lurie, and Eni Mustafaraj. 2019. How the Interplay of Google and Wikipedia Affects Perceptions of Online News Sources. In Computation+ Journalism Symposium.Google ScholarGoogle Scholar
  31. Jonathan Shieber. 2020. Google backtracks on search results design. TechCrunch (January 2020). Retrieved from https://techcrunch.com/2020/01/24/google-backtracks-on-search-results-designGoogle ScholarGoogle Scholar
  32. Amit Singhal. 2012. Introducing the knowledge graph: things, not strings. Official google blog 16, (2012). Retrieved from https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.htmlGoogle ScholarGoogle Scholar
  33. Luca Soldaini, Andrew Yates, Elad Yom-Tov, Ophir Frieder, and Nazli Goharian. 2016. Enhancing web search in the medical domain via query clarification. Information Retrieval Journal 19, 1--2 (2016), 149--173.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tim Soulo. 2019. Top Google searches (as of October 2019). Retrieved from https://ahrefs.com/blog/top-google-searchesGoogle ScholarGoogle Scholar
  35. Dario Taraborelli. 2015. The Sum of All Human Knowledge in the Age of Machines: A New Research Agenda for Wikimedia. ICWSM-15 Workshop on Wikipedia, a Social Pedia: Research Challenges and Opportunities,.Google ScholarGoogle Scholar
  36. Maddy Varner and Sam Morris. 2021. Introducing Simple Search -- The Markup. The Markup. Retrieved from https://themarkup.org/google-the-giant/2020/11/10/introducing-simple-searchGoogle ScholarGoogle Scholar
  37. Nicholas Vincent and Brent Hecht. 2020. Can "Conscious Data Contribution" Help Users to Exert "Data Leverage" Against Technology Companies?Google ScholarGoogle Scholar
  38. Nicholas Vincent, Brent Hecht, and Shilad Sen. 2019. "Data Strikes": Evaluating the Effectiveness of New Forms of Collective Action Against Technology Platforms. In Proceedings of The Web Conference 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Nicholas Vincent, Isaac Johnson, Patrick Sheehan, and Brent Hecht. 2019. Measuring the Importance of User-Generated Content to Search Engines. In Proceedings of AAAI ICWSM 2019.Google ScholarGoogle ScholarCross RefCross Ref
  40. Nicholas Vincent, Hanlin Li, Nicole Tilly, Stevie Chancellor, and Brent Hecht. 2021. Data Leverage: A Framework for Empowering the Public in its Relationship with Technology Companies. In ACM FAccT 2021 (formerly FAT*).Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Claudia Wagner, David Garcia, Mohsen Jadidi, and Markus Strohmaier. 2015. It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia. In ICWSM, 454--463.Google ScholarGoogle Scholar
  42. Ryen W White, Fernando Diaz, and Qi Guo. 2017. Search result prefetching on desktop and mobile. ACM Transactions on Information Systems (TOIS) 35, 3 (2017), 1--34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. 2015. It's Official: Google Says More Searches Now On Mobile Than On Desktop - Search Engine Land. Retrieved from https://searchengineland.com/its-official-google-says-more-searches-now-on-mobile-than-on-desktop-220369Google ScholarGoogle Scholar
  44. 2018. Google Trends. Retrieved from https://trends.google.com/trends/hottrendsGoogle ScholarGoogle Scholar
  45. 2018. Popular Screen Resolutions - Media Genesis. Retrieved from https://mediag.com/blog/popular-screen-resolutions-designing-for-allGoogle ScholarGoogle Scholar
  46. 2020. Web search engine - Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Web_search_engine#Market_shareGoogle ScholarGoogle Scholar
  47. 2020. StatCounter Global Stats - Browser, OS, Search Engine including Mobile Usage Share. Retrieved from https://gs.statcounter.comGoogle ScholarGoogle Scholar
  48. 2020. ComScore US Search Market Share. Retrieved from https://www.comscore.com/Insights/Rankings?country=US#tab_search_shareGoogle ScholarGoogle Scholar
  49. Protests against SOPA and PIPA - Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Protests_against_SOPA_and_PIPAGoogle ScholarGoogle Scholar

Index Terms

  1. A Deeper Investigation of the Importance of Wikipedia Links to Search Engine Results

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Human-Computer Interaction
      Proceedings of the ACM on Human-Computer Interaction  Volume 5, Issue CSCW1
      CSCW
      April 2021
      5016 pages
      EISSN:2573-0142
      DOI:10.1145/3460939
      Issue’s Table of Contents

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 April 2021
      Published in pacmhci Volume 5, Issue CSCW1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader