research-article

An Experimental Study of Automatic Detection and Measurement of Counterfeit in Brand Search Results

Authors:
Claudio Carpineto

Fondazione Ugo Bordoni, Rome, Italy

Fondazione Ugo Bordoni, Rome, Italy
View Profile

,
Giovanni Romano

Fondazione Ugo Bordoni, Rome, Italy

Fondazione Ugo Bordoni, Rome, Italy
View Profile

Authors Info & Claims

ACM Transactions on the Web Volume 14 Issue 2Article No.: 6pp 1–35https://doi.org/10.1145/3378443

Published:07 February 2020Publication History

ACM Transactions on the Web

Abstract

Brand search results are poisoned by fake ecommerce websites that infringe on the trademark rights of legitimate holders. In this article, we study how to tackle and measure this problem automatically. We present a pipeline with two machine learning stages that can detect the ecommerce websites present in the list of brand search results and distinguish between legitimate and fake ecommerce websites. For each classification task, we identify and extract suitable learning features and study their relative importance. Through a prototype system termed RI.SI.CO., we show that this approach is feasible, fast, and more accurate than both existing systems for trustworthiness assessment and non-expert humans. We next introduce two complementary metrics for evaluating the counterfeit incidence in brand search results: namely, a chart-based and a single-value measure. They allow us to analyze and compare counterfeit at various levels, including single brands within a specific sector as well as whole sectors. Experimenting with two luxury goods sectors, we report a number of interesting findings about how the main search parameters (e.g., search engine, query type, number of search results seen) affect counterfeiting and how this activity changes with time. On the whole, our research offers new insights and some very practical and useful means of analyzing and measuring counterfeit in brand search results, thus increasing awareness of and knowledge about this phenomenon and enabling targeted anti-counterfeiting actions.

References

Clarivate. 2017. MarkMonitor Online Shopping Barometer 2017. Retrieved from https://www.markmonitor.com/brand-protection-domain-management-resources/anticounterfeiting-whitepapers-datasheets/g/consumer-goods-global-online-survey-barometer-2017.Google Scholar
Julian Alarte, David Insa, and Josep Silva. 2017. Webpage menu detection based on DOM. In Proceedings of the 43rd International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM’17). 411--422.Google ScholarCross Ref
Sushma Nagesh Bannur, Lawrence K. Saul, and Stefan Savage. 2011. Judging a site by its content: Learning the textual, structural, and visual features of malicious web pages. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence (AISec’11). 1--10.Google ScholarDigital Library
Lidong Bing, Wai Lam, and Tak-Lam Wong. 2013. Robust detection of semi-structured web records using a DOM structure-knowledge-driven model. ACM Trans. Web 7, 4 (2013).Google ScholarDigital Library
D. Canali, M. Cova, G. Vigna, and C. Kruegel. 2011. Prophiler: A fast filter for the large-scale detection of malicious web pages. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). 197--206.Google Scholar
Claudio Carpineto, Davide Lo Re, and Giovanni Romano. 2016. Automatic assessment of website compliance to the European cookie law with CooLCheck. In Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES’16). 135--138.Google ScholarDigital Library
Claudio Carpineto, Davide Lo Re, and Giovanni Romano. 2017. Using information retrieval to evaluate trustworthiness assessment of eshops. In Proceedings of the 8th Italian Information Retrieval Workshop (IIR’17). 1--8.Google Scholar
Claudio Carpineto and Giovanni Romano. 2017. Learning to detect and measure fake ecommerce websites in search-engine results. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI’17). 403--410.Google ScholarDigital Library
Igino Corona, Matteo Contini, Davide Ariu, Giorgio Giacinto, Fabio Roli, Michael Lund, and Giorgio Marinelli. 2015. PharmaGuard: Automatic identification of illegal search-indexed online pharmacies. In Proceedings of the IEEE 2nd International Conference on Cybernetics. 324--329.Google ScholarCross Ref
M. Der, L. Saul, S. Savage, and G. M. Voelker. 2014. Knock it off: Profiling the online storefronts of counterfeit merchandise. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM Press, 1759--1768.Google Scholar
Jeffrey P. Dotson, Ruixue Rachel Fan, McDonnel Feit Elea, Jeffrey D. Oldham, and Yi-Hsin Yeh. 2017. Brand attitudes and search engine queries. J. Interact. Market. 37 (2017), 105--116.Google ScholarCross Ref
Jake Drew and Tyler Moore. 2014. Optimized combined clustering methods for finding replicated criminal websites. EURASIP J. Inf. Sec. 14, 1 (2014), 1--13.Google Scholar
Birhanu Eshete, Adolfo Villafiorita, and Komminist Weldemariam. 2012. BINSPECT: Holistic analysis and detection of malicious web pages. In Proceedings of the Security and Privacy in Communication Networks Conference (SecureComm’12). 149--166.Google Scholar
Andrea Horch, Holger Kette, and Anette Weisbecker. 2015. Extracting product offers from e-shop websites. In Proceedings of the International Conference on Web Information Systems and Technologies (WEBIST’15). Springer, 232--251.Google Scholar
Kalervo Jarvelin and Jaana Kekalaine. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4 (2002), 422--446.Google ScholarDigital Library
John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy, and Martin Abadi. 2011. deSEO: Combating search-result poisoning. In Proceedings of the 20th USENIX Conference on Security (SEC’11). 20--20.Google Scholar
H. B. Kazemian and S. Ahmed. 2015. Comparisons of machine learning techniques for detecting malicious webpages. Exp. Syst. Applic. 42, 3 (2015), 1166--1177.Google ScholarDigital Library
N. Leontiadis, T. Moore, and N. Christin. 2014. A nearly four-year longitudinal study of search-engine poisoning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS’14). 930--941.Google Scholar
N. Leontiadis, T. Moore, and N. Christin. 2011. Measuring and analyzing search-redirection attacks in the illicit online prescription drug trade. In Proceedings of the USENIX Security Conference.Google Scholar
Xiaojing Liao, Chang Liu, Damon McCoy, Elaine Shi, Shuang Hao, and Raheem Beyah. 2016. Characterizing long-tail SEO spam on cloud web hosting services. In Proceedings of the 25th International Conference on World Wide Web (WWW’16).Google ScholarDigital Library
Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. 2009. Beyond blacklists: Learning to detect malicious web sites from suspicious URLs. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1245--1254.Google Scholar
Tim K. Mackey and Gaurvika Nayyar. 2017. A review of existing and emerging digital technologies to combat the global trade in fake medicines. Exp. Opin. Drug Saf. 5, 16 (2017), 587--602.Google ScholarCross Ref
C. D. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press.Google Scholar
Mary L. McHugh. 2012. Interrater reliability. Biochem. Med. 22, 3 (2012), 276--282.Google ScholarCross Ref
NetNames. 2016. The Risks of the Online Counterfeit Economy. A Netname Report.Google Scholar
Alexandros Ntoulas, Marc Najork, Marc Manasse, and Dennis Fetterly. 2006. Detecting spam web pages through content analysis. In Proceedings of the 15th International Conference on World Wide Web (WWW’06). ACM Press, 83--92.Google ScholarDigital Library
OECD/EUIPO. 2016. Trade in Counterfeit and Pirated Goods: Mapping the Economic Impact. OECD Publishing, Paris.Google Scholar
Niels Provos, Dean McNamee, Panayiotis Mavrommatis, Ke Wang, and Nagendra Modadugu. 2007. The ghost in the browser: Analysis of web-based malware. In Proceedings of the 1st Workshop on Hot Topics in Understanding Botnets (HotBots’07).Google Scholar
Xiaoguang Qi and Brian D. Davison. 2009. Web page classification: Features and algorithms. ACM Comput. Surv. 41, 2 (2009), 1--31. Retrieved from http://portal.acm.org/citation.cfm?id=1459352.14593578coll=Portal8dl=GUIDE8CFID=262184508CFTOKEN=84727292.Google ScholarDigital Library
Davide Lo Re and Claudio Carpineto. 2016. Enhancing user awareness and control of web tracking with ManTra. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI’16). 391--398. DOI:https://doi.org/10.1109/WI.2016.0061Google ScholarCross Ref
M. Riek, R. Bohme, and T. Moore. 2016. Measuring the influence of perceived cybercrime risk on online service avoidance. IEEE Trans. Depend. Sec. Comput. 13, 2 (2016), 261--273.Google ScholarDigital Library
Thorsten Staake, Frederic Thiesse, and Elgar Fleisch. 2012. Business strategies in the counterfeit market. J. Bus. Res. 3, 65 (2012), 658--665.Google ScholarCross Ref
Tanguy Urvoy, Emmanuel Chauveau, Pascale Filoche, and Thomas Lavergne. 2008. Tracking web spam with HTML style similarities. ACM Trans. Web 2, 1 (2008), Article No. 3.Google ScholarDigital Library
Ronald van Bezu, Sjoerd Borst, Rick Rijske, Jim Verhagen, and Damir Vandic Flavius Frasincar. 2015. Multi-component similarity method for web product duplicate detection. In Proceedings of the 30th ACM Symposium on Applied Computing. ACM Press, 761--768.Google ScholarDigital Library
John Wadleigh, Jake Drew, and Tyler Moore. 2015. The e-commerce market for “lemons”: Identification and analysis of websites selling counterfeit goods. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). 1188--1197.Google ScholarDigital Library
John R. Wadleigh. 2015. Tracking How Cybercriminals Compromise Websites to Sell Counterfeit Goods. Master’s thesis. Bobby B. Lyle School of Engineering: Department of Computer Science, Southern Methodist University, Texas.Google Scholar
A. Wang. 2010. Don’t follow me: Spam detection in Twitter. In Proceedings of the International Conference on Security and Cryptography (SECRYPT’10). IEEE, 1--10.Google Scholar
D. Y. Wang, M. Der, M. Karami, L. Saul, D. McCoy, S. Savage, and G. M. Voelker. 2014. Search + seizure: The effectiveness of interventions on SEO campaigns. In Proceedings of the Conference on Internet Measurement. ACM Press, 359--372.Google Scholar
Y. Wang, W. Cai, and P. Wei. 2016. A deep learning approach for detecting malicious JavaScript code. Sec. Commun. Netw. 9, 11 (2016), 1520--1534.Google ScholarDigital Library
Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. 2010. A practical guide to support vector classification.Google Scholar
Baoning Wu and Brian D. Davison. 2005. Identifying link farm spam pages. In Proceedings of the Special Interest Tracks and Posters of the 14th International Conference on World Wide Web (WWW’05). ACM Press, 820--829.Google Scholar

Index Terms

An Experimental Study of Automatic Detection and Measurement of Counterfeit in Brand Search Results
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. World Wide Web

Recommendations

The E-Commerce Market for "Lemons": Identification and Analysis of Websites Selling Counterfeit Goods
WWW '15: Proceedings of the 24th International Conference on World Wide Web

We investigate the practice of websites selling counterfeit goods. We inspect web search results for 225 queries across 25 brands. We devise a binary classifier that predicts whether a given website is selling counterfeits by examining automatically ...
Read More
Learning to detect and measure fake ecommerce websites in search-engine results
WI '17: Proceedings of the International Conference on Web Intelligence

When searching for a brand name in search engines, it is very likely to come across websites that sell fake brand's products. In this paper, we study how to tackle and measure this problem automatically. Our solution consists of a pipeline with two ...
Read More
Own-Brand and Cross-Brand Retail Pass-Through

In this paper we describe the pass-through behavior of a major U.S. supermarket chain for 78 products across 11 categories. Our data set includes retail prices and wholesale prices for stores in 15 retail price zones for a one-year period. For the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on the Web Volume 14, Issue 2
May 2020
149 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/3382502
Editor:
Brian D. Davison
Lehigh University, USA
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 February 2020
- Accepted: 1 December 2019
- Revised: 1 April 2019
- Received: 1 April 2018
Published in tweb Volume 14, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Online counterfeit goods
cybercrime measurement
spam detection in web search results
trustworthiness assessment of eshops
website classification
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 441
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

An Experimental Study of Automatic Detection and Measurement of Counterfeit in Brand Search Results

ACM Transactions on the Web

Abstract

References

Cited By

Index Terms

Recommendations

The E-Commerce Market for "Lemons": Identification and Analysis of Websites Selling Counterfeit Goods

Learning to detect and measure fake ecommerce websites in search-engine results

Own-Brand and Cross-Brand Retail Pass-Through