A hybrid approach for identifying non-human traffic in online digital advertising

Almahmoud, Sawsan; Hammo, Bassam; Al-Shboul, Bashar; Obeid, Nadim

doi:10.1007/s11042-021-11533-4

A hybrid approach for identifying non-human traffic in online digital advertising

Published: 11 October 2021

Volume 81, pages 1685–1718, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Sawsan Almahmoud¹,
Bassam Hammo^2,3,
Bashar Al-Shboul ORCID: orcid.org/0000-0002-5214-6429⁴ &
…
Nadim Obeid²

514 Accesses
1 Altmetric
Explore all metrics

Abstract

Click fraud is a serious problem facing online advertising business. The malicious intent of clicking online ads either committed by humans or by non-humans, forced financial losses on advertisers utilizing pay-per-click advertising. Non-human traffic is usually designed to inflate web traffic for fraudulent purposes. In this paper, we demonstrate a hybrid approach consisting of two-level fingerprint applied in two phases to detect illegitimate non-human traffic. The first-level fingerprint is a pattern generated using immutable information about a user navigating a website’s pages. It will be used in the first traffic illegitimacy detection phase to infer rules about illegitimate non-human traffic from a developed ontology about web traffic legitimacy. The second-level fingerprint is generated using behavioral ad click patterns, which will be used in the second detection phase by applying a Machine-Learning (ML) algorithm. To test the proposed approach, a real commercial website for ads, called Waseet.com, was used. The access logs of the website server were utilized for the purpose of this research. The experiments show that our proposed hybrid approach using the ontology of web traffic illegitimacy and the ML k-NN classifier detects around (98.6%) of fake clicks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring Non-Human Traffic in Online Digital Advertisements: Analysis and Prediction

A Novel Website Fingerprinting Method for Malicious Websites Detection

Machine Learning-Based Malicious URLs Detection

References

Alexopoulos P, Kafentzis K, Benetou X, Tagaris T, Georgolios P (2007) Towards a generic fraud ontology in e-government. In ICE-B 269–276
Alhabash S, Mundel J, Hussain SA (2017) Social media advertising: unraveling the mystery box. Digital Advertising. Routledge, England, pp 285–299
Chapter Google Scholar
Ali MA, Azad MA, Centeno MP, Hao F, van Moorsel A (2019) Consumer-facing technology fraud: Economics, attack methods and potential solutions. Futur Gener Comput Syst 100:408–427. https://doi.org/10.1016/j.future.2019.03.041
Article Google Scholar
Almahmoud S, Hammo B, Al-Shboul B (2019) Exploring non-human traffic in online digital advertisements: analysis and prediction. In: International Conference on Computational Collective Intelligence. Springer, Cham. pp. 663–675. https://doi.org/10.1007/978-3-030-28374-2_57
Alrwais SA, Gerber A, Dunn CW, Spatscheck O, Gupta M, Osterweil E (2012) Dissecting ghost clicks: Ad fraud via misdirected human clicks. In: Proceedings of the 28th Annual Computer Security Applications Conference pp. 21–30. https://doi.org/10.1145/2420950.2420954
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185. https://doi.org/10.1080/00031305.1992.10475879
Article MathSciNet Google Scholar
Attigeri G, MM MP, Pai RM, Kulkarni R (2018) Knowledge base ontology building for fraud detection using topic modeling. Procedia Comput Sci 135:369–376
Article Google Scholar
Baarder F, Nutt W (2003) The description logic handbook, chapter 2. Basic description logics. pp 43–95
Baader F, Calvanese D, McGuinness D, Patel-Schneider P, Nardi D (eds) (2003) The description logic handbook: theory, implementation and applications. Cambridge University Press, Cambridge
MATH Google Scholar
Boser BE, Guyon I.M, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory. pp 144–152. https://doi.org/10.1145/130385.130401
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1007/BF00058655
Article MATH Google Scholar
Buehrer G, Stokes JW, Chellapilla K (2008) A large-scale study of automated web search traffic. In: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. pp 1–8. https://doi.org/10.1145/1451983.1451985
Carvalho RN, Matsumoto S, Laskey KB, Costa PC, Ladeira M, Santos LL (2010) Probabilistic ontology and knowledge fusion for procurement fraud detection in Brazil. Uncertainty reasoning for the semantic web ii. Springer, Berlin, Heidelberg, pp 19–40
Google Scholar
Chakraborty M, Pal S, Pramanik R, Chowdary CR (2016) Recent developments in social spam detection and combating techniques: a survey. Inf Process Manag 52(6):1053–1073. https://doi.org/10.1016/j.ipm.2016.04.009
Article Google Scholar
Chen Y, Kintis P, Antonakakis M, Nadji Y, Dagon D, Farrell M (2017) Measuring lower bounds of the financial abuse to online advertisers: a four year case study of the TDSS/TDL4 Botnet. Comput Secur 67:164–180. https://doi.org/10.1016/j.cose.2017.02.010
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Digital Ad Primer [online]. Available: https://www.martechadvisor.com/articles/display-and-native-advertising/digital-advertising-primer-martech-101/, Accessed from 10 Mar 2020
Dong F, Wang H, Li L, Guo Y, Bissyandé TF, Liu T, Xu G, Klein J (2018) Frauddroid: automated ad fraud detection for android apps. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp 257–268. https://doi.org/10.1145/3236024.3236045
Drummond N, Horridge M, Knublauch H (2005) Protégé-OWL tutorial. In: 8th International Protégé Conference
El Orche A, Bahaj M (2019) Approach to use ontology based on electronic payment system and machine learning to prevent Fraud. In: Proceedings of the 2nd International Conference on Networking, Information Systems & Security. Rabat, Morocco, pp 1–6. https://doi.org/10.1145/3320326.3320369
El-Atawy SS, Khalefa ME (2016) Building an ontology-based electronic health record system. In: Proceedings of the 2nd Africa and Middle East Conference on Software Engineering. pp 40–45. https://doi.org/10.1145/2944165.2944172
Fang L, Cai M, Fu H, Dong J (2007) Ontology-based fraud detection. In: International Conference on Computational Science. Springer, Berlin, Heidelberg, pp 1048–1055. https://doi.org/10.1007/978-3-540-72588-6_168
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory. Springer, Berlin, Heidelberg, pp 23–37. https://doi.org/10.1007/3-540-59119-2_166
Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(5):771–780
Google Scholar
Gabryel M (2018) Data analysis algorithm for click fraud recognition. In: International Conference on Information and Software Technologies. Springer, Cham, pp 437–446. https://doi.org/10.1007/978-3-319-99972-2_36
Gabryel M, Przybyszewski K (2019) The dynamically modified BoW algorithm used in assessing clicks in online ads. In: International Conference on Artificial Intelligence and Soft Computing. Springer, Cham, pp 350–360. https://doi.org/10.1007/978-3-030-20915-5_32
Guarino N, Oberle D, Staab S (2009) What is an ontology? Handbook on ontologies. Springer, Berlin, Heidelberg, pp 1–17
Google Scholar
Gupta N, Le HA, Boldina M, Woo J (2019) Predicting fraud of AD click using Traditional and Spark ML. In: The 14th Asia Pacific International Conference on Information Science and Technology (APIC-IST). Beijing, China, pp 24–28
Haider CMR, Iqbal A, Rahman AH, Rahman MS (2018) An ensemble learning based approach for impression fraud detection in mobile advertising. J Netw Comput Appl 112:126–141. https://doi.org/10.1016/j.jnca.2018.02.021
Article Google Scholar
Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class adaboost. Stat Interface 2(3):349–360. https://doi.org/10.4310/SII.2009.v2.n3.a8
Article MathSciNet MATH Google Scholar
Hlomani H, Stacey D (2014) Approaches, methods, metrics, measures, and subjectivity in ontology evaluation: a survey. Semant Web J 1(5):1–11
Google Scholar
Imperva Incapsula. Bot Traffic Report 2016. (2017) [online] Available: https://www.incapsula.com/blog/bot-traffic-report-2016.html, Accessed from 20 May 2020
Iqbal MS, Zulkernine M, Jaafar F, Gu Y (2016) Fcfraud: fighting click-fraud from the user side. In: 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE). IEEE. pp 157–164. https://doi.org/10.1109/HASE.2016.17
Kampichler C, Wieland R, Calmé S, Weissenberger H, Arriaga-Weiss S (2010) Classification in conservation biology: a comparison of five machine-learning methods. Eco Inform 5(6):441–450. https://doi.org/10.1016/j.ecoinf.2010.06.003
Article Google Scholar
Kaur R, Singh S, Kumar H (2018) Rise of spam and compromised accounts in online social networks: a state-of-the-art review of different combating approaches. J Netw Comput Appl 112:53–88. https://doi.org/10.1016/j.jnca.2018.03.015
Article Google Scholar
Kerremans K, Tang Y, Temmerman R, Zhao G (2005) Towards ontology-based e-mail fraud detection. In: 2005 portuguese conference on artificial intelligence. IEEE, pp 106–111
Kheir N (2012) Analyzing http user agent anomalies for malware detection. Data privacy management and autonomous spontaneous security. Springer, Berlin, Heidelberg, pp 187–200
Google Scholar
La VH, Fuentes R, Cavalli AR (2016) Network monitoring using mmt: an application based on the user-agent field in http headers. In: 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA). IEEE, pp 147–154. https://doi.org/10.1109/AINA.2016.41
Minastireanu EA, Mesnita G (2019) Light gbm machine learning algorithm to online click fraud detection. J Inf Assur Cybersecur. https://doi.org/10.5171/2019.263928.
Article Google Scholar
Mladenow A, Novak NM, Strauss C (2015) Online ad-fraud in search engine advertising campaigns. In: Information and Communication Technology-EurAsia Conference. Springer, Cham, pp 109–118. https://doi.org/10.1007/978-3-319-24315-3_11
Mungamuru B, Weis S (2008).Competition and fraud in online advertising markets. In: International Conference on Financial Cryptography and Data Security. Springer, Berlin, Heidelberg, pp 187–191. https://doi.org/10.1007/978-3-540-85230-8_16
Nagaraja S, Shah R (2019) Clicktok: click fraud detection using traffic analysis. In: Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks. pp 105–116. https://doi.org/10.1145/3317549.3323407
Obeid M, Obeid Z, Moubaiddin A, Obeid N (2019) Using description logic and abox abduction to capture medical diagnosis. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer, Cham, pp 376–388. https://doi.org/10.1007/978-3-030-22999-3_33
Papadopoulos P, Azurmendi IQ, Zhang J, Varvello M, Nappa A, Livshits B (2019) ZKSENSE: a privacy-preserving mechanism for bot detection in mobile devices. arXiv preprint arXiv:1911.07649
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106. https://doi.org/10.1007/BF00116251
Article Google Scholar
Ramaki AA, Asgari R, Atani RE (2012) Credit card fraud detection based on ontology graph. International Journal of Security, Privacy and Trust Management (IJSPTM) 1(5):1–12
Article Google Scholar
Segal MR (2004) Machine learning benchmarks and random forest regression. UCSF: Center for Bioinformatics and Molecular Biostatistics [online], Available: https://escholarship.org/uc/item/35x3v9t4, Accessed from 20 May 2020
Singh M, Singh M, Kaur S (2019) Detecting bot-infected machines using DNS fingerprinting. Digit Investig 28:14–33
Article Google Scholar
Steinbach M, Tan PN (2009) kNN: k-nearest neighbors. The top ten algorithms in data mining. Chapman and Hall/CRC, Florida, pp 165–176
Google Scholar
Stenberg D (2018) Everything cUR. [Online]. Available: https://ec.haxx.se/, Accessed from 20 May 2020
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300. https://doi.org/10.1023/A:1018628609742
Article Google Scholar
Tang XB, Liu GC, Yang J, Wei W (2018) Knowledge-based financial statement fraud detection system: based on an ontology and a decision tree. KO Knowl Organ 45(3):205–219
Article Google Scholar
The protégé development package. https://protege.stanford.edu/, Visisted from 20 May 2020
Thejas GS, Boroojeni KG, Chandna K, Bhatia I, Iyengar SS, Sunitha NR (2019) Deep learning-based model to fight against ad click fraud. In: Proceedings of the 2019 ACM Southeast Conference. pp 176–181. https://doi.org/10.1145/3299815.3314453
Wang AH (2010) Detecting spam bots in online social networking sites: a machine learning approach. In: IFIP Annual Conference on Data and Applications Security and Privac. Springer, Berlin, Heidelberg, pp 335–342. https://doi.org/10.1007/978-3-642-13739-6_25
Waseet Classified Ads. (2019) [online]. Available: http://waseet.net, Accessed from 20 May 2020
Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390
Article Google Scholar
Zarras A, Kapravelos A, Stringhini G, Holz T, Kruegel C, Vigna G (2014) The dark alleys of Madison Avenue: Understanding malicious advertisements. In: Proceedings of the 2014 Conference on Internet Measurement Conference. pp 373–380. https://doi.org/10.1145/2663716.2663719
Zhang M, Meng W, Lee S, Lee B, Xing X (2019) All your clicks belong to me: investigating click interception on the web. In: Proceedings of 28th USENIX Security Symposium. Santa Clara, CA, USA, pp 941–957
Zhang X, Ghorbani AA (2020) An overview of online fake news: characterization, detection, and discussion. Inf Process Manage 57(2):102025. https://doi.org/10.1016/j.ipm.2019.03.004
Article Google Scholar
Zhu X, Tao H, Wu Z, Cao J, Kalish K, Kayne J (2017) Fraud prevention in online digital advertising. Springer International Publishing, NewYork
Book Google Scholar

Download references

Acknowledgements

The authors would like to thank the CEO of Waseet.com Classified Ads for his restricted permission to use the website’s logs and his support whenever required and for the anonymous volunteers for doing the evaluation.

Author information

Authors and Affiliations

Department of Computer Science, King Abdullah II School of Information Technology, The University of Jordan, Amman, 11942, Jordan
Sawsan Almahmoud
Department of Computer Information Systems, King Abdullah II School of Information Technology, The University of Jordan, Amman, 11942, Jordan
Bassam Hammo & Nadim Obeid
King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Amman, Jordan
Bassam Hammo
Department of Information Technology, King Abdullah II School of Information Technology, The University of Jordan, Amman, 11942, Jordan
Bashar Al-Shboul

Authors

Sawsan Almahmoud
View author publications
You can also search for this author in PubMed Google Scholar
Bassam Hammo
View author publications
You can also search for this author in PubMed Google Scholar
Bashar Al-Shboul
View author publications
You can also search for this author in PubMed Google Scholar
Nadim Obeid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bashar Al-Shboul.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Almahmoud, S., Hammo, B., Al-Shboul, B. et al. A hybrid approach for identifying non-human traffic in online digital advertising. Multimed Tools Appl 81, 1685–1718 (2022). https://doi.org/10.1007/s11042-021-11533-4

Download citation

Received: 25 May 2020
Revised: 08 January 2021
Accepted: 29 August 2021
Published: 11 October 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s11042-021-11533-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid approach for identifying non-human traffic in online digital advertising

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploring Non-Human Traffic in Online Digital Advertisements: Analysis and Prediction

A Novel Website Fingerprinting Method for Malicious Websites Detection

Machine Learning-Based Malicious URLs Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now