Skip to main content

Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search Engines

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2024)

Abstract

Many users of web search engines have been complaining in recent years about the supposedly decreasing quality of search results. This is often attributed to an increasing amount of search-engine-optimized but low-quality content. Evidence for this has always been anecdotal, yet it’s not unreasonable to think that popular online marketing strategies such as affiliate marketing incentivize the mass production of such content to maximize clicks. Since neither this complaint nor affiliate marketing as such have received much attention from the IR community, we hereby lay the groundwork by conducting an in-depth exploratory study of how affiliate content affects today’s search engines. We monitored Google, Bing and DuckDuckGo for a year on 7,392 product review queries. Our findings suggest that all search engines have significant problems with highly optimized (affiliate) content—more than is representative for the entire web according to a baseline retrieval system on the ClueWeb22. Focussing on the product review genre, we find that only a small portion of product reviews on the web uses affiliate marketing, but the majority of all search results do. Of all affiliate networks, Amazon Associates is by far the most popular. We further observe an inverse relationship between affiliate marketing use and content complexity, and that all search engines fall victim to large-scale affiliate link spam campaigns. However, we also notice that the line between benign content and spam in the form of content and link farms becomes increasingly blurry—a situation that will surely worsen in the wake of generative AI. We conclude that dynamic adversarial spam in the form of low-quality, mass-produced commercial content deserves more attention. (Code and data: https://github.com/webis-de/ECIR-24).

J. Bevendorff and M. Wiegmann—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/.

  2. 2.

    r is Pearson’s correlation coefficient with \(p \ll .001\), unless stated otherwise.

References

  1. Amarasekara, B., Mathrani, A., Scogings, C.: Stuffing, sniffing, squatting, and stalking: sham activities in affiliate marketing. Libr. Trends 68(4), 659–678 (2020)

    Google Scholar 

  2. Asdaghi, F., Soleimani, A.: An effective feature selection method for web spam detection. Knowl.-Based Syst. 166, 198–206 (2019)

    Article  Google Scholar 

  3. Azzopardi, L., Thomas, P., Craswell, N.: Measuring the utility of search engine result pages: an information foraging based measure. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, pp. 605–614. Association for Computing Machinery, New York, NY, USA, 27 June 2018. https://doi.org/10.1145/3209978.3210027

  4. Bevendorff, J., Potthast, M., Stein, B.: FastWARC: optimizing large-scale web archive analytics. In: Wagner, A., Guetl, C., Granitzer, M., Voigt, S. (eds.) 3rd International Symposium on Open Search Technology (OSSYM 2021). International Open Search Symposium, October 2021

    Google Scholar 

  5. Bevendorff, J., Stein, B., Hagen, M., Potthast, M.: Elastic ChatNoir: search engine for the ClueWeb and the common crawl. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 820–824. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_83

    Chapter  Google Scholar 

  6. Carterette, B.: System effectiveness, user models, and user utility: a conceptual framework for investigation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 903–912. Association for Computing Machinery, New York, NY, USA, 24 July 2011. https://doi.org/10.1145/2009916.2010037

  7. Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your neighbors: web spam detection using the web topology. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 423–430. Association for Computing Machinery, New York, NY, USA, July 2007

    Google Scholar 

  8. Chachra, N., Savage, S., Voelker, G.M.: Affiliate crookies: characterizing affiliate marketing abuse. In: Proceedings of the 2015 Internet Measurement Conference, IMC 2015, pp. 41–47. Association for Computing Machinery, New York, NY, USA, October 2015. https://doi.org/10.1145/2815675.2815720

  9. Chandra, A., Suaib, M., Beg, R.: Google search algorithm updates against web spam. Inform. Eng. Int. J. 3(1), 1–10 (2015)

    Google Scholar 

  10. De Jonge, T., Hiemstra, D.: UNFair: search engine manipulation, undetectable by amortized inequity. In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, pp. 830–839. Association for Computing Machinery, New York, NY, USA, 12 June 2023. https://doi.org/10.1145/3593013.3594046

  11. Edelman, B., Brandi, W.: Information and incentives in online affiliate marketing. Citeseer (2013)

    Google Scholar 

  12. Epstein, R., Robertson, R.E.: The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proc. Nat. Acad. Sci. U.S.A. 112(33), E4512–21 (2015). https://doi.org/10.1073/pnas.1419828112

  13. Google Search Central: Affiliate programs (2022). https://developers.google.com/search/docs/advanced/guidelines/affiliate-programs. Accessed 17 June 2022

  14. Google Search Central: Write high quality product reviews (2022). https://developers.google.com/search/docs/advanced/ecommerce/write-high-quality-product-reviews. Accessed 17 June 2022

  15. Gregori, N., Daniele, R., Altinay, L.: Affiliate marketing in tourism: determinants of consumer trust. J. Travel Res. 53(2), 196–210 (2014). https://doi.org/10.1177/0047287513491333

    Article  Google Scholar 

  16. Gyongyi, Z., Garcia-Molina, H.: Spam: it’s not just for inboxes anymore. Computer 38(10), 28–34 (2005)

    Article  Google Scholar 

  17. Heydari, A., Tavakoli, M.A., Salim, N., Heydari, Z.: Detection of review spam: a survey. Expert Syst. Appl. 42(7), 3634–3642 (2015)

    Article  Google Scholar 

  18. Kincaid, J.P., Fishburne, R.P. Jr., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for navy enlisted personnel (1975)

    Google Scholar 

  19. Kurland, O., Tennenholtz, M.: Competitive search. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, pp. 2838–2849. Association for Computing Machinery, New York, NY, USA, 7 July 2022. https://doi.org/10.1145/3477495.3532771

  20. Lewandowski, D., Kerkmann, F., Rümmele, S., Sünkler, S.: An empirical investigation on search engine ad disclosure. J. Am. Soc. Inf. Sci. 69(3), 420–437 (2018)

    Google Scholar 

  21. Lewandowski, D., Schultheiß, S.: Public awareness and attitudes towards search engine optimization. Behav. Inf. Technol. 42(8), 1025–1044 (2023). https://doi.org/10.1080/0144929X.2022.2056507

  22. Lewandowski, D., Sünkler, S., Yagci, N.: The influence of search engine optimization on Google’s results: a multi-dimensional approach for detecting SEO. In: WebSci, pp. 12–20. ACM (2021)

    Google Scholar 

  23. Liao, X., Liu, C., McCoy, D., Shi, E., Hao, S., Beyah, R.A.: Characterizing long-tail SEO spam on cloud web hosting services. In: Bourdeau, J., Hendler, J., Nkambou, R., Horrocks, I., Zhao, B.Y. (eds.) Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, 11–15 April 2016, pp. 321–332. ACM (2016). https://doi.org/10.1145/2872427.2883008

  24. Liu, J., Su, Y., Lv, S., Huang, C.: Detecting web spam based on novel features from web page source code. Secur. Commun. Netw. 2020 (2020)

    Google Scholar 

  25. Moffat, A., Thomas, P., Scholer, F.: Users versus models: what observation tells us about effectiveness metrics. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM 2013, pp. 659–668. Association for Computing Machinery, New York, NY, USA, 27 October 2013. https://doi.org/10.1145/2505515.2507665

  26. Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. Secur. 27(1), 1–27 (2008). https://doi.org/10.1145/1416950.1416952

  27. Mohawesh, R., et al.: Fake reviews detection: a survey. IEEE Access 9, 65771–65802 (2021)

    Article  Google Scholar 

  28. Morik, M., Singh, A., Hong, J., Joachims, T.: Controlling fairness and bias in dynamic learning-to-rank. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, pp. 429–438. Association for Computing Machinery, New York, NY, USA, 25 July 2020. https://doi.org/10.1145/3397271.3401100

  29. Ocampo Diaz, G., Ng, V.: Modeling and prediction of online product review helpfulness: a survey. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 698–708. Association for Computational Linguistics, Melbourne, Australia, July 2018

    Google Scholar 

  30. Overwijk, A., Xiong, C., Liu, X., VandenBerg, C., Callan, J.: ClueWeb 22: 10 billion web documents with visual and semantic information. arXiv (2022). https://doi.org/10.48550/ARXIV.2211.15848. https://arxiv.org/abs/2211.15848

  31. Purcell, K., Rainie, L., Brenner, J.: Search engine use 2012 (2012)

    Google Scholar 

  32. Raj, A., Ekstrand, M.D.: Measuring fairness in ranked results: an analytical and empirical comparison. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, pp. 726–736. Association for Computing Machinery, New York, NY, USA, 7 July 2022. https://doi.org/10.1145/3477495.3532018

  33. Schultheiß, S., Häußler, H., Lewandowski, D.: Does search engine optimization come along with high-quality content?: A comparison between optimized and non-optimized health-related web pages. In: CHIIR, pp. 123–134. ACM (2022)

    Google Scholar 

  34. Schultheiß, S., Lewandowski, D.: “Outside the industry, nobody knows what we do” SEO as seen by search engine optimizers and content providers. J. Doc. 77(2), 542–557 (2020). https://doi.org/10.1108/JD-07-2020-0127

  35. Snyder, P., Kanich, C.: Characterizing fraud and its ramifications in affiliate marketing networks. J. Cybersecur. 2(1), 71–81 (2016)

    Article  Google Scholar 

  36. Zehlike, M., Yang, K., Stoyanovich, J.: Fairness in ranking: a survey, 25 March 2021

    Google Scholar 

  37. Zobel, J.: When measurement misleads: the limits of batch assessment of retrieval systems. SIGIR Forum 56(1), 1–20 (2023). https://doi.org/10.1145/3582524.3582540

Download references

Acknowledgments

This publication has received funding from the European Commission under grant agreement № 101070014 (OpenWebSearch.eu).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Janek Bevendorff .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bevendorff, J., Wiegmann, M., Potthast, M., Stein, B. (2024). Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search Engines. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14610. Springer, Cham. https://doi.org/10.1007/978-3-031-56063-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56063-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56062-0

  • Online ISBN: 978-3-031-56063-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics