Abstract
Brand search results are poisoned by fake ecommerce websites that infringe on the trademark rights of legitimate holders. In this article, we study how to tackle and measure this problem automatically. We present a pipeline with two machine learning stages that can detect the ecommerce websites present in the list of brand search results and distinguish between legitimate and fake ecommerce websites. For each classification task, we identify and extract suitable learning features and study their relative importance. Through a prototype system termed RI.SI.CO., we show that this approach is feasible, fast, and more accurate than both existing systems for trustworthiness assessment and non-expert humans. We next introduce two complementary metrics for evaluating the counterfeit incidence in brand search results: namely, a chart-based and a single-value measure. They allow us to analyze and compare counterfeit at various levels, including single brands within a specific sector as well as whole sectors. Experimenting with two luxury goods sectors, we report a number of interesting findings about how the main search parameters (e.g., search engine, query type, number of search results seen) affect counterfeiting and how this activity changes with time. On the whole, our research offers new insights and some very practical and useful means of analyzing and measuring counterfeit in brand search results, thus increasing awareness of and knowledge about this phenomenon and enabling targeted anti-counterfeiting actions.
- Clarivate. 2017. MarkMonitor Online Shopping Barometer 2017. Retrieved from https://www.markmonitor.com/brand-protection-domain-management-resources/anticounterfeiting-whitepapers-datasheets/g/consumer-goods-global-online-survey-barometer-2017.Google Scholar
- Julian Alarte, David Insa, and Josep Silva. 2017. Webpage menu detection based on DOM. In Proceedings of the 43rd International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM’17). 411--422.Google ScholarCross Ref
- Sushma Nagesh Bannur, Lawrence K. Saul, and Stefan Savage. 2011. Judging a site by its content: Learning the textual, structural, and visual features of malicious web pages. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence (AISec’11). 1--10.Google ScholarDigital Library
- Lidong Bing, Wai Lam, and Tak-Lam Wong. 2013. Robust detection of semi-structured web records using a DOM structure-knowledge-driven model. ACM Trans. Web 7, 4 (2013).Google ScholarDigital Library
- D. Canali, M. Cova, G. Vigna, and C. Kruegel. 2011. Prophiler: A fast filter for the large-scale detection of malicious web pages. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). 197--206.Google Scholar
- Claudio Carpineto, Davide Lo Re, and Giovanni Romano. 2016. Automatic assessment of website compliance to the European cookie law with CooLCheck. In Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES’16). 135--138.Google ScholarDigital Library
- Claudio Carpineto, Davide Lo Re, and Giovanni Romano. 2017. Using information retrieval to evaluate trustworthiness assessment of eshops. In Proceedings of the 8th Italian Information Retrieval Workshop (IIR’17). 1--8.Google Scholar
- Claudio Carpineto and Giovanni Romano. 2017. Learning to detect and measure fake ecommerce websites in search-engine results. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI’17). 403--410.Google ScholarDigital Library
- Igino Corona, Matteo Contini, Davide Ariu, Giorgio Giacinto, Fabio Roli, Michael Lund, and Giorgio Marinelli. 2015. PharmaGuard: Automatic identification of illegal search-indexed online pharmacies. In Proceedings of the IEEE 2nd International Conference on Cybernetics. 324--329.Google ScholarCross Ref
- M. Der, L. Saul, S. Savage, and G. M. Voelker. 2014. Knock it off: Profiling the online storefronts of counterfeit merchandise. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM Press, 1759--1768.Google Scholar
- Jeffrey P. Dotson, Ruixue Rachel Fan, McDonnel Feit Elea, Jeffrey D. Oldham, and Yi-Hsin Yeh. 2017. Brand attitudes and search engine queries. J. Interact. Market. 37 (2017), 105--116.Google ScholarCross Ref
- Jake Drew and Tyler Moore. 2014. Optimized combined clustering methods for finding replicated criminal websites. EURASIP J. Inf. Sec. 14, 1 (2014), 1--13.Google Scholar
- Birhanu Eshete, Adolfo Villafiorita, and Komminist Weldemariam. 2012. BINSPECT: Holistic analysis and detection of malicious web pages. In Proceedings of the Security and Privacy in Communication Networks Conference (SecureComm’12). 149--166.Google Scholar
- Andrea Horch, Holger Kette, and Anette Weisbecker. 2015. Extracting product offers from e-shop websites. In Proceedings of the International Conference on Web Information Systems and Technologies (WEBIST’15). Springer, 232--251.Google Scholar
- Kalervo Jarvelin and Jaana Kekalaine. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4 (2002), 422--446.Google ScholarDigital Library
- John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy, and Martin Abadi. 2011. deSEO: Combating search-result poisoning. In Proceedings of the 20th USENIX Conference on Security (SEC’11). 20--20.Google Scholar
- H. B. Kazemian and S. Ahmed. 2015. Comparisons of machine learning techniques for detecting malicious webpages. Exp. Syst. Applic. 42, 3 (2015), 1166--1177.Google ScholarDigital Library
- N. Leontiadis, T. Moore, and N. Christin. 2014. A nearly four-year longitudinal study of search-engine poisoning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS’14). 930--941.Google Scholar
- N. Leontiadis, T. Moore, and N. Christin. 2011. Measuring and analyzing search-redirection attacks in the illicit online prescription drug trade. In Proceedings of the USENIX Security Conference.Google Scholar
- Xiaojing Liao, Chang Liu, Damon McCoy, Elaine Shi, Shuang Hao, and Raheem Beyah. 2016. Characterizing long-tail SEO spam on cloud web hosting services. In Proceedings of the 25th International Conference on World Wide Web (WWW’16).Google ScholarDigital Library
- Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. 2009. Beyond blacklists: Learning to detect malicious web sites from suspicious URLs. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1245--1254.Google Scholar
- Tim K. Mackey and Gaurvika Nayyar. 2017. A review of existing and emerging digital technologies to combat the global trade in fake medicines. Exp. Opin. Drug Saf. 5, 16 (2017), 587--602.Google ScholarCross Ref
- C. D. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press.Google Scholar
- Mary L. McHugh. 2012. Interrater reliability. Biochem. Med. 22, 3 (2012), 276--282.Google ScholarCross Ref
- NetNames. 2016. The Risks of the Online Counterfeit Economy. A Netname Report.Google Scholar
- Alexandros Ntoulas, Marc Najork, Marc Manasse, and Dennis Fetterly. 2006. Detecting spam web pages through content analysis. In Proceedings of the 15th International Conference on World Wide Web (WWW’06). ACM Press, 83--92.Google ScholarDigital Library
- OECD/EUIPO. 2016. Trade in Counterfeit and Pirated Goods: Mapping the Economic Impact. OECD Publishing, Paris.Google Scholar
- Niels Provos, Dean McNamee, Panayiotis Mavrommatis, Ke Wang, and Nagendra Modadugu. 2007. The ghost in the browser: Analysis of web-based malware. In Proceedings of the 1st Workshop on Hot Topics in Understanding Botnets (HotBots’07).Google Scholar
- Xiaoguang Qi and Brian D. Davison. 2009. Web page classification: Features and algorithms. ACM Comput. Surv. 41, 2 (2009), 1--31. Retrieved from http://portal.acm.org/citation.cfm?id=1459352.14593578coll=Portal8dl=GUIDE8CFID=262184508CFTOKEN=84727292.Google ScholarDigital Library
- Davide Lo Re and Claudio Carpineto. 2016. Enhancing user awareness and control of web tracking with ManTra. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI’16). 391--398. DOI:https://doi.org/10.1109/WI.2016.0061Google ScholarCross Ref
- M. Riek, R. Bohme, and T. Moore. 2016. Measuring the influence of perceived cybercrime risk on online service avoidance. IEEE Trans. Depend. Sec. Comput. 13, 2 (2016), 261--273.Google ScholarDigital Library
- Thorsten Staake, Frederic Thiesse, and Elgar Fleisch. 2012. Business strategies in the counterfeit market. J. Bus. Res. 3, 65 (2012), 658--665.Google ScholarCross Ref
- Tanguy Urvoy, Emmanuel Chauveau, Pascale Filoche, and Thomas Lavergne. 2008. Tracking web spam with HTML style similarities. ACM Trans. Web 2, 1 (2008), Article No. 3.Google ScholarDigital Library
- Ronald van Bezu, Sjoerd Borst, Rick Rijske, Jim Verhagen, and Damir Vandic Flavius Frasincar. 2015. Multi-component similarity method for web product duplicate detection. In Proceedings of the 30th ACM Symposium on Applied Computing. ACM Press, 761--768.Google ScholarDigital Library
- John Wadleigh, Jake Drew, and Tyler Moore. 2015. The e-commerce market for “lemons”: Identification and analysis of websites selling counterfeit goods. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). 1188--1197.Google ScholarDigital Library
- John R. Wadleigh. 2015. Tracking How Cybercriminals Compromise Websites to Sell Counterfeit Goods. Master’s thesis. Bobby B. Lyle School of Engineering: Department of Computer Science, Southern Methodist University, Texas.Google Scholar
- A. Wang. 2010. Don’t follow me: Spam detection in Twitter. In Proceedings of the International Conference on Security and Cryptography (SECRYPT’10). IEEE, 1--10.Google Scholar
- D. Y. Wang, M. Der, M. Karami, L. Saul, D. McCoy, S. Savage, and G. M. Voelker. 2014. Search + seizure: The effectiveness of interventions on SEO campaigns. In Proceedings of the Conference on Internet Measurement. ACM Press, 359--372.Google Scholar
- Y. Wang, W. Cai, and P. Wei. 2016. A deep learning approach for detecting malicious JavaScript code. Sec. Commun. Netw. 9, 11 (2016), 1520--1534.Google ScholarDigital Library
- Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. 2010. A practical guide to support vector classification.Google Scholar
- Baoning Wu and Brian D. Davison. 2005. Identifying link farm spam pages. In Proceedings of the Special Interest Tracks and Posters of the 14th International Conference on World Wide Web (WWW’05). ACM Press, 820--829.Google Scholar
Index Terms
- An Experimental Study of Automatic Detection and Measurement of Counterfeit in Brand Search Results
Recommendations
The E-Commerce Market for "Lemons": Identification and Analysis of Websites Selling Counterfeit Goods
WWW '15: Proceedings of the 24th International Conference on World Wide WebWe investigate the practice of websites selling counterfeit goods. We inspect web search results for 225 queries across 25 brands. We devise a binary classifier that predicts whether a given website is selling counterfeits by examining automatically ...
Learning to detect and measure fake ecommerce websites in search-engine results
WI '17: Proceedings of the International Conference on Web IntelligenceWhen searching for a brand name in search engines, it is very likely to come across websites that sell fake brand's products. In this paper, we study how to tackle and measure this problem automatically. Our solution consists of a pipeline with two ...
Own-Brand and Cross-Brand Retail Pass-Through
In this paper we describe the pass-through behavior of a major U.S. supermarket chain for 78 products across 11 categories. Our data set includes retail prices and wholesale prices for stores in 15 retail price zones for a one-year period. For the ...
Comments