Abstract
Click fraud is a serious problem facing online advertising business. The malicious intent of clicking online ads either committed by humans or by non-humans, forced financial losses on advertisers utilizing pay-per-click advertising. Non-human traffic is usually designed to inflate web traffic for fraudulent purposes. In this paper, we demonstrate a hybrid approach consisting of two-level fingerprint applied in two phases to detect illegitimate non-human traffic. The first-level fingerprint is a pattern generated using immutable information about a user navigating a website’s pages. It will be used in the first traffic illegitimacy detection phase to infer rules about illegitimate non-human traffic from a developed ontology about web traffic legitimacy. The second-level fingerprint is generated using behavioral ad click patterns, which will be used in the second detection phase by applying a Machine-Learning (ML) algorithm. To test the proposed approach, a real commercial website for ads, called Waseet.com, was used. The access logs of the website server were utilized for the purpose of this research. The experiments show that our proposed hybrid approach using the ontology of web traffic illegitimacy and the ML k-NN classifier detects around (98.6%) of fake clicks.









Similar content being viewed by others
References
Alexopoulos P, Kafentzis K, Benetou X, Tagaris T, Georgolios P (2007) Towards a generic fraud ontology in e-government. In ICE-B 269–276
Alhabash S, Mundel J, Hussain SA (2017) Social media advertising: unraveling the mystery box. Digital Advertising. Routledge, England, pp 285–299
Ali MA, Azad MA, Centeno MP, Hao F, van Moorsel A (2019) Consumer-facing technology fraud: Economics, attack methods and potential solutions. Futur Gener Comput Syst 100:408–427. https://doi.org/10.1016/j.future.2019.03.041
Almahmoud S, Hammo B, Al-Shboul B (2019) Exploring non-human traffic in online digital advertisements: analysis and prediction. In: International Conference on Computational Collective Intelligence. Springer, Cham. pp. 663–675. https://doi.org/10.1007/978-3-030-28374-2_57
Alrwais SA, Gerber A, Dunn CW, Spatscheck O, Gupta M, Osterweil E (2012) Dissecting ghost clicks: Ad fraud via misdirected human clicks. In: Proceedings of the 28th Annual Computer Security Applications Conference pp. 21–30. https://doi.org/10.1145/2420950.2420954
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185. https://doi.org/10.1080/00031305.1992.10475879
Attigeri G, MM MP, Pai RM, Kulkarni R (2018) Knowledge base ontology building for fraud detection using topic modeling. Procedia Comput Sci 135:369–376
Baarder F, Nutt W (2003) The description logic handbook, chapter 2. Basic description logics. pp 43–95
Baader F, Calvanese D, McGuinness D, Patel-Schneider P, Nardi D (eds) (2003) The description logic handbook: theory, implementation and applications. Cambridge University Press, Cambridge
Boser BE, Guyon I.M, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory. pp 144–152. https://doi.org/10.1145/130385.130401
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1007/BF00058655
Buehrer G, Stokes JW, Chellapilla K (2008) A large-scale study of automated web search traffic. In: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. pp 1–8. https://doi.org/10.1145/1451983.1451985
Carvalho RN, Matsumoto S, Laskey KB, Costa PC, Ladeira M, Santos LL (2010) Probabilistic ontology and knowledge fusion for procurement fraud detection in Brazil. Uncertainty reasoning for the semantic web ii. Springer, Berlin, Heidelberg, pp 19–40
Chakraborty M, Pal S, Pramanik R, Chowdary CR (2016) Recent developments in social spam detection and combating techniques: a survey. Inf Process Manag 52(6):1053–1073. https://doi.org/10.1016/j.ipm.2016.04.009
Chen Y, Kintis P, Antonakakis M, Nadji Y, Dagon D, Farrell M (2017) Measuring lower bounds of the financial abuse to online advertisers: a four year case study of the TDSS/TDL4 Botnet. Comput Secur 67:164–180. https://doi.org/10.1016/j.cose.2017.02.010
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Digital Ad Primer [online]. Available: https://www.martechadvisor.com/articles/display-and-native-advertising/digital-advertising-primer-martech-101/, Accessed from 10 Mar 2020
Dong F, Wang H, Li L, Guo Y, Bissyandé TF, Liu T, Xu G, Klein J (2018) Frauddroid: automated ad fraud detection for android apps. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp 257–268. https://doi.org/10.1145/3236024.3236045
Drummond N, Horridge M, Knublauch H (2005) Protégé-OWL tutorial. In: 8th International Protégé Conference
El Orche A, Bahaj M (2019) Approach to use ontology based on electronic payment system and machine learning to prevent Fraud. In: Proceedings of the 2nd International Conference on Networking, Information Systems & Security. Rabat, Morocco, pp 1–6. https://doi.org/10.1145/3320326.3320369
El-Atawy SS, Khalefa ME (2016) Building an ontology-based electronic health record system. In: Proceedings of the 2nd Africa and Middle East Conference on Software Engineering. pp 40–45. https://doi.org/10.1145/2944165.2944172
Fang L, Cai M, Fu H, Dong J (2007) Ontology-based fraud detection. In: International Conference on Computational Science. Springer, Berlin, Heidelberg, pp 1048–1055. https://doi.org/10.1007/978-3-540-72588-6_168
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory. Springer, Berlin, Heidelberg, pp 23–37. https://doi.org/10.1007/3-540-59119-2_166
Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(5):771–780
Gabryel M (2018) Data analysis algorithm for click fraud recognition. In: International Conference on Information and Software Technologies. Springer, Cham, pp 437–446. https://doi.org/10.1007/978-3-319-99972-2_36
Gabryel M, Przybyszewski K (2019) The dynamically modified BoW algorithm used in assessing clicks in online ads. In: International Conference on Artificial Intelligence and Soft Computing. Springer, Cham, pp 350–360. https://doi.org/10.1007/978-3-030-20915-5_32
Guarino N, Oberle D, Staab S (2009) What is an ontology? Handbook on ontologies. Springer, Berlin, Heidelberg, pp 1–17
Gupta N, Le HA, Boldina M, Woo J (2019) Predicting fraud of AD click using Traditional and Spark ML. In: The 14th Asia Pacific International Conference on Information Science and Technology (APIC-IST). Beijing, China, pp 24–28
Haider CMR, Iqbal A, Rahman AH, Rahman MS (2018) An ensemble learning based approach for impression fraud detection in mobile advertising. J Netw Comput Appl 112:126–141. https://doi.org/10.1016/j.jnca.2018.02.021
Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class adaboost. Stat Interface 2(3):349–360. https://doi.org/10.4310/SII.2009.v2.n3.a8
Hlomani H, Stacey D (2014) Approaches, methods, metrics, measures, and subjectivity in ontology evaluation: a survey. Semant Web J 1(5):1–11
Imperva Incapsula. Bot Traffic Report 2016. (2017) [online] Available: https://www.incapsula.com/blog/bot-traffic-report-2016.html, Accessed from 20 May 2020
Iqbal MS, Zulkernine M, Jaafar F, Gu Y (2016) Fcfraud: fighting click-fraud from the user side. In: 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE). IEEE. pp 157–164. https://doi.org/10.1109/HASE.2016.17
Kampichler C, Wieland R, Calmé S, Weissenberger H, Arriaga-Weiss S (2010) Classification in conservation biology: a comparison of five machine-learning methods. Eco Inform 5(6):441–450. https://doi.org/10.1016/j.ecoinf.2010.06.003
Kaur R, Singh S, Kumar H (2018) Rise of spam and compromised accounts in online social networks: a state-of-the-art review of different combating approaches. J Netw Comput Appl 112:53–88. https://doi.org/10.1016/j.jnca.2018.03.015
Kerremans K, Tang Y, Temmerman R, Zhao G (2005) Towards ontology-based e-mail fraud detection. In: 2005 portuguese conference on artificial intelligence. IEEE, pp 106–111
Kheir N (2012) Analyzing http user agent anomalies for malware detection. Data privacy management and autonomous spontaneous security. Springer, Berlin, Heidelberg, pp 187–200
La VH, Fuentes R, Cavalli AR (2016) Network monitoring using mmt: an application based on the user-agent field in http headers. In: 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA). IEEE, pp 147–154. https://doi.org/10.1109/AINA.2016.41
Minastireanu EA, Mesnita G (2019) Light gbm machine learning algorithm to online click fraud detection. J Inf Assur Cybersecur. https://doi.org/10.5171/2019.263928.
Mladenow A, Novak NM, Strauss C (2015) Online ad-fraud in search engine advertising campaigns. In: Information and Communication Technology-EurAsia Conference. Springer, Cham, pp 109–118. https://doi.org/10.1007/978-3-319-24315-3_11
Mungamuru B, Weis S (2008).Competition and fraud in online advertising markets. In: International Conference on Financial Cryptography and Data Security. Springer, Berlin, Heidelberg, pp 187–191. https://doi.org/10.1007/978-3-540-85230-8_16
Nagaraja S, Shah R (2019) Clicktok: click fraud detection using traffic analysis. In: Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks. pp 105–116. https://doi.org/10.1145/3317549.3323407
Obeid M, Obeid Z, Moubaiddin A, Obeid N (2019) Using description logic and abox abduction to capture medical diagnosis. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer, Cham, pp 376–388. https://doi.org/10.1007/978-3-030-22999-3_33
Papadopoulos P, Azurmendi IQ, Zhang J, Varvello M, Nappa A, Livshits B (2019) ZKSENSE: a privacy-preserving mechanism for bot detection in mobile devices. arXiv preprint arXiv:1911.07649
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106. https://doi.org/10.1007/BF00116251
Ramaki AA, Asgari R, Atani RE (2012) Credit card fraud detection based on ontology graph. International Journal of Security, Privacy and Trust Management (IJSPTM) 1(5):1–12
Segal MR (2004) Machine learning benchmarks and random forest regression. UCSF: Center for Bioinformatics and Molecular Biostatistics [online], Available: https://escholarship.org/uc/item/35x3v9t4, Accessed from 20 May 2020
Singh M, Singh M, Kaur S (2019) Detecting bot-infected machines using DNS fingerprinting. Digit Investig 28:14–33
Steinbach M, Tan PN (2009) kNN: k-nearest neighbors. The top ten algorithms in data mining. Chapman and Hall/CRC, Florida, pp 165–176
Stenberg D (2018) Everything cUR. [Online]. Available: https://ec.haxx.se/, Accessed from 20 May 2020
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300. https://doi.org/10.1023/A:1018628609742
Tang XB, Liu GC, Yang J, Wei W (2018) Knowledge-based financial statement fraud detection system: based on an ontology and a decision tree. KO Knowl Organ 45(3):205–219
The protégé development package. https://protege.stanford.edu/, Visisted from 20 May 2020
Thejas GS, Boroojeni KG, Chandna K, Bhatia I, Iyengar SS, Sunitha NR (2019) Deep learning-based model to fight against ad click fraud. In: Proceedings of the 2019 ACM Southeast Conference. pp 176–181. https://doi.org/10.1145/3299815.3314453
Wang AH (2010) Detecting spam bots in online social networking sites: a machine learning approach. In: IFIP Annual Conference on Data and Applications Security and Privac. Springer, Berlin, Heidelberg, pp 335–342. https://doi.org/10.1007/978-3-642-13739-6_25
Waseet Classified Ads. (2019) [online]. Available: http://waseet.net, Accessed from 20 May 2020
Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390
Zarras A, Kapravelos A, Stringhini G, Holz T, Kruegel C, Vigna G (2014) The dark alleys of Madison Avenue: Understanding malicious advertisements. In: Proceedings of the 2014 Conference on Internet Measurement Conference. pp 373–380. https://doi.org/10.1145/2663716.2663719
Zhang M, Meng W, Lee S, Lee B, Xing X (2019) All your clicks belong to me: investigating click interception on the web. In: Proceedings of 28th USENIX Security Symposium. Santa Clara, CA, USA, pp 941–957
Zhang X, Ghorbani AA (2020) An overview of online fake news: characterization, detection, and discussion. Inf Process Manage 57(2):102025. https://doi.org/10.1016/j.ipm.2019.03.004
Zhu X, Tao H, Wu Z, Cao J, Kalish K, Kayne J (2017) Fraud prevention in online digital advertising. Springer International Publishing, NewYork
Acknowledgements
The authors would like to thank the CEO of Waseet.com Classified Ads for his restricted permission to use the website’s logs and his support whenever required and for the anonymous volunteers for doing the evaluation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Almahmoud, S., Hammo, B., Al-Shboul, B. et al. A hybrid approach for identifying non-human traffic in online digital advertising. Multimed Tools Appl 81, 1685–1718 (2022). https://doi.org/10.1007/s11042-021-11533-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11533-4