Abstract
Many users of web search engines have been complaining in recent years about the supposedly decreasing quality of search results. This is often attributed to an increasing amount of search-engine-optimized but low-quality content. Evidence for this has always been anecdotal, yet it’s not unreasonable to think that popular online marketing strategies such as affiliate marketing incentivize the mass production of such content to maximize clicks. Since neither this complaint nor affiliate marketing as such have received much attention from the IR community, we hereby lay the groundwork by conducting an in-depth exploratory study of how affiliate content affects today’s search engines. We monitored Google, Bing and DuckDuckGo for a year on 7,392 product review queries. Our findings suggest that all search engines have significant problems with highly optimized (affiliate) content—more than is representative for the entire web according to a baseline retrieval system on the ClueWeb22. Focussing on the product review genre, we find that only a small portion of product reviews on the web uses affiliate marketing, but the majority of all search results do. Of all affiliate networks, Amazon Associates is by far the most popular. We further observe an inverse relationship between affiliate marketing use and content complexity, and that all search engines fall victim to large-scale affiliate link spam campaigns. However, we also notice that the line between benign content and spam in the form of content and link farms becomes increasingly blurry—a situation that will surely worsen in the wake of generative AI. We conclude that dynamic adversarial spam in the form of low-quality, mass-produced commercial content deserves more attention. (Code and data: https://github.com/webis-de/ECIR-24).
J. Bevendorff and M. Wiegmann—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
r is Pearson’s correlation coefficient with \(p \ll .001\), unless stated otherwise.
References
Amarasekara, B., Mathrani, A., Scogings, C.: Stuffing, sniffing, squatting, and stalking: sham activities in affiliate marketing. Libr. Trends 68(4), 659–678 (2020)
Asdaghi, F., Soleimani, A.: An effective feature selection method for web spam detection. Knowl.-Based Syst. 166, 198–206 (2019)
Azzopardi, L., Thomas, P., Craswell, N.: Measuring the utility of search engine result pages: an information foraging based measure. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, pp. 605–614. Association for Computing Machinery, New York, NY, USA, 27 June 2018. https://doi.org/10.1145/3209978.3210027
Bevendorff, J., Potthast, M., Stein, B.: FastWARC: optimizing large-scale web archive analytics. In: Wagner, A., Guetl, C., Granitzer, M., Voigt, S. (eds.) 3rd International Symposium on Open Search Technology (OSSYM 2021). International Open Search Symposium, October 2021
Bevendorff, J., Stein, B., Hagen, M., Potthast, M.: Elastic ChatNoir: search engine for the ClueWeb and the common crawl. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 820–824. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_83
Carterette, B.: System effectiveness, user models, and user utility: a conceptual framework for investigation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 903–912. Association for Computing Machinery, New York, NY, USA, 24 July 2011. https://doi.org/10.1145/2009916.2010037
Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your neighbors: web spam detection using the web topology. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 423–430. Association for Computing Machinery, New York, NY, USA, July 2007
Chachra, N., Savage, S., Voelker, G.M.: Affiliate crookies: characterizing affiliate marketing abuse. In: Proceedings of the 2015 Internet Measurement Conference, IMC 2015, pp. 41–47. Association for Computing Machinery, New York, NY, USA, October 2015. https://doi.org/10.1145/2815675.2815720
Chandra, A., Suaib, M., Beg, R.: Google search algorithm updates against web spam. Inform. Eng. Int. J. 3(1), 1–10 (2015)
De Jonge, T., Hiemstra, D.: UNFair: search engine manipulation, undetectable by amortized inequity. In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, pp. 830–839. Association for Computing Machinery, New York, NY, USA, 12 June 2023. https://doi.org/10.1145/3593013.3594046
Edelman, B., Brandi, W.: Information and incentives in online affiliate marketing. Citeseer (2013)
Epstein, R., Robertson, R.E.: The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proc. Nat. Acad. Sci. U.S.A. 112(33), E4512–21 (2015). https://doi.org/10.1073/pnas.1419828112
Google Search Central: Affiliate programs (2022). https://developers.google.com/search/docs/advanced/guidelines/affiliate-programs. Accessed 17 June 2022
Google Search Central: Write high quality product reviews (2022). https://developers.google.com/search/docs/advanced/ecommerce/write-high-quality-product-reviews. Accessed 17 June 2022
Gregori, N., Daniele, R., Altinay, L.: Affiliate marketing in tourism: determinants of consumer trust. J. Travel Res. 53(2), 196–210 (2014). https://doi.org/10.1177/0047287513491333
Gyongyi, Z., Garcia-Molina, H.: Spam: it’s not just for inboxes anymore. Computer 38(10), 28–34 (2005)
Heydari, A., Tavakoli, M.A., Salim, N., Heydari, Z.: Detection of review spam: a survey. Expert Syst. Appl. 42(7), 3634–3642 (2015)
Kincaid, J.P., Fishburne, R.P. Jr., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for navy enlisted personnel (1975)
Kurland, O., Tennenholtz, M.: Competitive search. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, pp. 2838–2849. Association for Computing Machinery, New York, NY, USA, 7 July 2022. https://doi.org/10.1145/3477495.3532771
Lewandowski, D., Kerkmann, F., Rümmele, S., Sünkler, S.: An empirical investigation on search engine ad disclosure. J. Am. Soc. Inf. Sci. 69(3), 420–437 (2018)
Lewandowski, D., Schultheiß, S.: Public awareness and attitudes towards search engine optimization. Behav. Inf. Technol. 42(8), 1025–1044 (2023). https://doi.org/10.1080/0144929X.2022.2056507
Lewandowski, D., Sünkler, S., Yagci, N.: The influence of search engine optimization on Google’s results: a multi-dimensional approach for detecting SEO. In: WebSci, pp. 12–20. ACM (2021)
Liao, X., Liu, C., McCoy, D., Shi, E., Hao, S., Beyah, R.A.: Characterizing long-tail SEO spam on cloud web hosting services. In: Bourdeau, J., Hendler, J., Nkambou, R., Horrocks, I., Zhao, B.Y. (eds.) Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, 11–15 April 2016, pp. 321–332. ACM (2016). https://doi.org/10.1145/2872427.2883008
Liu, J., Su, Y., Lv, S., Huang, C.: Detecting web spam based on novel features from web page source code. Secur. Commun. Netw. 2020 (2020)
Moffat, A., Thomas, P., Scholer, F.: Users versus models: what observation tells us about effectiveness metrics. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM 2013, pp. 659–668. Association for Computing Machinery, New York, NY, USA, 27 October 2013. https://doi.org/10.1145/2505515.2507665
Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. Secur. 27(1), 1–27 (2008). https://doi.org/10.1145/1416950.1416952
Mohawesh, R., et al.: Fake reviews detection: a survey. IEEE Access 9, 65771–65802 (2021)
Morik, M., Singh, A., Hong, J., Joachims, T.: Controlling fairness and bias in dynamic learning-to-rank. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, pp. 429–438. Association for Computing Machinery, New York, NY, USA, 25 July 2020. https://doi.org/10.1145/3397271.3401100
Ocampo Diaz, G., Ng, V.: Modeling and prediction of online product review helpfulness: a survey. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 698–708. Association for Computational Linguistics, Melbourne, Australia, July 2018
Overwijk, A., Xiong, C., Liu, X., VandenBerg, C., Callan, J.: ClueWeb 22: 10 billion web documents with visual and semantic information. arXiv (2022). https://doi.org/10.48550/ARXIV.2211.15848. https://arxiv.org/abs/2211.15848
Purcell, K., Rainie, L., Brenner, J.: Search engine use 2012 (2012)
Raj, A., Ekstrand, M.D.: Measuring fairness in ranked results: an analytical and empirical comparison. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, pp. 726–736. Association for Computing Machinery, New York, NY, USA, 7 July 2022. https://doi.org/10.1145/3477495.3532018
Schultheiß, S., Häußler, H., Lewandowski, D.: Does search engine optimization come along with high-quality content?: A comparison between optimized and non-optimized health-related web pages. In: CHIIR, pp. 123–134. ACM (2022)
Schultheiß, S., Lewandowski, D.: “Outside the industry, nobody knows what we do” SEO as seen by search engine optimizers and content providers. J. Doc. 77(2), 542–557 (2020). https://doi.org/10.1108/JD-07-2020-0127
Snyder, P., Kanich, C.: Characterizing fraud and its ramifications in affiliate marketing networks. J. Cybersecur. 2(1), 71–81 (2016)
Zehlike, M., Yang, K., Stoyanovich, J.: Fairness in ranking: a survey, 25 March 2021
Zobel, J.: When measurement misleads: the limits of batch assessment of retrieval systems. SIGIR Forum 56(1), 1–20 (2023). https://doi.org/10.1145/3582524.3582540
Acknowledgments
This publication has received funding from the European Commission under grant agreement № 101070014 (OpenWebSearch.eu).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bevendorff, J., Wiegmann, M., Potthast, M., Stein, B. (2024). Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search Engines. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14610. Springer, Cham. https://doi.org/10.1007/978-3-031-56063-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-56063-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56062-0
Online ISBN: 978-3-031-56063-7
eBook Packages: Computer ScienceComputer Science (R0)