skip to main content
research-article

An Experimental Study of Automatic Detection and Measurement of Counterfeit in Brand Search Results

Published:07 February 2020Publication History
Skip Abstract Section

Abstract

Brand search results are poisoned by fake ecommerce websites that infringe on the trademark rights of legitimate holders. In this article, we study how to tackle and measure this problem automatically. We present a pipeline with two machine learning stages that can detect the ecommerce websites present in the list of brand search results and distinguish between legitimate and fake ecommerce websites. For each classification task, we identify and extract suitable learning features and study their relative importance. Through a prototype system termed RI.SI.CO., we show that this approach is feasible, fast, and more accurate than both existing systems for trustworthiness assessment and non-expert humans. We next introduce two complementary metrics for evaluating the counterfeit incidence in brand search results: namely, a chart-based and a single-value measure. They allow us to analyze and compare counterfeit at various levels, including single brands within a specific sector as well as whole sectors. Experimenting with two luxury goods sectors, we report a number of interesting findings about how the main search parameters (e.g., search engine, query type, number of search results seen) affect counterfeiting and how this activity changes with time. On the whole, our research offers new insights and some very practical and useful means of analyzing and measuring counterfeit in brand search results, thus increasing awareness of and knowledge about this phenomenon and enabling targeted anti-counterfeiting actions.

References

  1. Clarivate. 2017. MarkMonitor Online Shopping Barometer 2017. Retrieved from https://www.markmonitor.com/brand-protection-domain-management-resources/anticounterfeiting-whitepapers-datasheets/g/consumer-goods-global-online-survey-barometer-2017.Google ScholarGoogle Scholar
  2. Julian Alarte, David Insa, and Josep Silva. 2017. Webpage menu detection based on DOM. In Proceedings of the 43rd International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM’17). 411--422.Google ScholarGoogle ScholarCross RefCross Ref
  3. Sushma Nagesh Bannur, Lawrence K. Saul, and Stefan Savage. 2011. Judging a site by its content: Learning the textual, structural, and visual features of malicious web pages. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence (AISec’11). 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lidong Bing, Wai Lam, and Tak-Lam Wong. 2013. Robust detection of semi-structured web records using a DOM structure-knowledge-driven model. ACM Trans. Web 7, 4 (2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Canali, M. Cova, G. Vigna, and C. Kruegel. 2011. Prophiler: A fast filter for the large-scale detection of malicious web pages. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). 197--206.Google ScholarGoogle Scholar
  6. Claudio Carpineto, Davide Lo Re, and Giovanni Romano. 2016. Automatic assessment of website compliance to the European cookie law with CooLCheck. In Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES’16). 135--138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Claudio Carpineto, Davide Lo Re, and Giovanni Romano. 2017. Using information retrieval to evaluate trustworthiness assessment of eshops. In Proceedings of the 8th Italian Information Retrieval Workshop (IIR’17). 1--8.Google ScholarGoogle Scholar
  8. Claudio Carpineto and Giovanni Romano. 2017. Learning to detect and measure fake ecommerce websites in search-engine results. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI’17). 403--410.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Igino Corona, Matteo Contini, Davide Ariu, Giorgio Giacinto, Fabio Roli, Michael Lund, and Giorgio Marinelli. 2015. PharmaGuard: Automatic identification of illegal search-indexed online pharmacies. In Proceedings of the IEEE 2nd International Conference on Cybernetics. 324--329.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. Der, L. Saul, S. Savage, and G. M. Voelker. 2014. Knock it off: Profiling the online storefronts of counterfeit merchandise. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM Press, 1759--1768.Google ScholarGoogle Scholar
  11. Jeffrey P. Dotson, Ruixue Rachel Fan, McDonnel Feit Elea, Jeffrey D. Oldham, and Yi-Hsin Yeh. 2017. Brand attitudes and search engine queries. J. Interact. Market. 37 (2017), 105--116.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jake Drew and Tyler Moore. 2014. Optimized combined clustering methods for finding replicated criminal websites. EURASIP J. Inf. Sec. 14, 1 (2014), 1--13.Google ScholarGoogle Scholar
  13. Birhanu Eshete, Adolfo Villafiorita, and Komminist Weldemariam. 2012. BINSPECT: Holistic analysis and detection of malicious web pages. In Proceedings of the Security and Privacy in Communication Networks Conference (SecureComm’12). 149--166.Google ScholarGoogle Scholar
  14. Andrea Horch, Holger Kette, and Anette Weisbecker. 2015. Extracting product offers from e-shop websites. In Proceedings of the International Conference on Web Information Systems and Technologies (WEBIST’15). Springer, 232--251.Google ScholarGoogle Scholar
  15. Kalervo Jarvelin and Jaana Kekalaine. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4 (2002), 422--446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy, and Martin Abadi. 2011. deSEO: Combating search-result poisoning. In Proceedings of the 20th USENIX Conference on Security (SEC’11). 20--20.Google ScholarGoogle Scholar
  17. H. B. Kazemian and S. Ahmed. 2015. Comparisons of machine learning techniques for detecting malicious webpages. Exp. Syst. Applic. 42, 3 (2015), 1166--1177.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Leontiadis, T. Moore, and N. Christin. 2014. A nearly four-year longitudinal study of search-engine poisoning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS’14). 930--941.Google ScholarGoogle Scholar
  19. N. Leontiadis, T. Moore, and N. Christin. 2011. Measuring and analyzing search-redirection attacks in the illicit online prescription drug trade. In Proceedings of the USENIX Security Conference.Google ScholarGoogle Scholar
  20. Xiaojing Liao, Chang Liu, Damon McCoy, Elaine Shi, Shuang Hao, and Raheem Beyah. 2016. Characterizing long-tail SEO spam on cloud web hosting services. In Proceedings of the 25th International Conference on World Wide Web (WWW’16).Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. 2009. Beyond blacklists: Learning to detect malicious web sites from suspicious URLs. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1245--1254.Google ScholarGoogle Scholar
  22. Tim K. Mackey and Gaurvika Nayyar. 2017. A review of existing and emerging digital technologies to combat the global trade in fake medicines. Exp. Opin. Drug Saf. 5, 16 (2017), 587--602.Google ScholarGoogle ScholarCross RefCross Ref
  23. C. D. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press.Google ScholarGoogle Scholar
  24. Mary L. McHugh. 2012. Interrater reliability. Biochem. Med. 22, 3 (2012), 276--282.Google ScholarGoogle ScholarCross RefCross Ref
  25. NetNames. 2016. The Risks of the Online Counterfeit Economy. A Netname Report.Google ScholarGoogle Scholar
  26. Alexandros Ntoulas, Marc Najork, Marc Manasse, and Dennis Fetterly. 2006. Detecting spam web pages through content analysis. In Proceedings of the 15th International Conference on World Wide Web (WWW’06). ACM Press, 83--92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. OECD/EUIPO. 2016. Trade in Counterfeit and Pirated Goods: Mapping the Economic Impact. OECD Publishing, Paris.Google ScholarGoogle Scholar
  28. Niels Provos, Dean McNamee, Panayiotis Mavrommatis, Ke Wang, and Nagendra Modadugu. 2007. The ghost in the browser: Analysis of web-based malware. In Proceedings of the 1st Workshop on Hot Topics in Understanding Botnets (HotBots’07).Google ScholarGoogle Scholar
  29. Xiaoguang Qi and Brian D. Davison. 2009. Web page classification: Features and algorithms. ACM Comput. Surv. 41, 2 (2009), 1--31. Retrieved from http://portal.acm.org/citation.cfm?id=1459352.14593578coll=Portal8dl=GUIDE8CFID=262184508CFTOKEN=84727292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Davide Lo Re and Claudio Carpineto. 2016. Enhancing user awareness and control of web tracking with ManTra. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI’16). 391--398. DOI:https://doi.org/10.1109/WI.2016.0061Google ScholarGoogle ScholarCross RefCross Ref
  31. M. Riek, R. Bohme, and T. Moore. 2016. Measuring the influence of perceived cybercrime risk on online service avoidance. IEEE Trans. Depend. Sec. Comput. 13, 2 (2016), 261--273.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Thorsten Staake, Frederic Thiesse, and Elgar Fleisch. 2012. Business strategies in the counterfeit market. J. Bus. Res. 3, 65 (2012), 658--665.Google ScholarGoogle ScholarCross RefCross Ref
  33. Tanguy Urvoy, Emmanuel Chauveau, Pascale Filoche, and Thomas Lavergne. 2008. Tracking web spam with HTML style similarities. ACM Trans. Web 2, 1 (2008), Article No. 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ronald van Bezu, Sjoerd Borst, Rick Rijske, Jim Verhagen, and Damir Vandic Flavius Frasincar. 2015. Multi-component similarity method for web product duplicate detection. In Proceedings of the 30th ACM Symposium on Applied Computing. ACM Press, 761--768.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. John Wadleigh, Jake Drew, and Tyler Moore. 2015. The e-commerce market for “lemons”: Identification and analysis of websites selling counterfeit goods. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). 1188--1197.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. John R. Wadleigh. 2015. Tracking How Cybercriminals Compromise Websites to Sell Counterfeit Goods. Master’s thesis. Bobby B. Lyle School of Engineering: Department of Computer Science, Southern Methodist University, Texas.Google ScholarGoogle Scholar
  37. A. Wang. 2010. Don’t follow me: Spam detection in Twitter. In Proceedings of the International Conference on Security and Cryptography (SECRYPT’10). IEEE, 1--10.Google ScholarGoogle Scholar
  38. D. Y. Wang, M. Der, M. Karami, L. Saul, D. McCoy, S. Savage, and G. M. Voelker. 2014. Search + seizure: The effectiveness of interventions on SEO campaigns. In Proceedings of the Conference on Internet Measurement. ACM Press, 359--372.Google ScholarGoogle Scholar
  39. Y. Wang, W. Cai, and P. Wei. 2016. A deep learning approach for detecting malicious JavaScript code. Sec. Commun. Netw. 9, 11 (2016), 1520--1534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. 2010. A practical guide to support vector classification.Google ScholarGoogle Scholar
  41. Baoning Wu and Brian D. Davison. 2005. Identifying link farm spam pages. In Proceedings of the Special Interest Tracks and Posters of the 14th International Conference on World Wide Web (WWW’05). ACM Press, 820--829.Google ScholarGoogle Scholar

Index Terms

  1. An Experimental Study of Automatic Detection and Measurement of Counterfeit in Brand Search Results

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on the Web
            ACM Transactions on the Web  Volume 14, Issue 2
            May 2020
            149 pages
            ISSN:1559-1131
            EISSN:1559-114X
            DOI:10.1145/3382502
            Issue’s Table of Contents

            Copyright © 2020 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 7 February 2020
            • Accepted: 1 December 2019
            • Revised: 1 April 2019
            • Received: 1 April 2018
            Published in tweb Volume 14, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format