Skip to main content

Advertisement

Log in

A hybrid approach for identifying non-human traffic in online digital advertising

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Click fraud is a serious problem facing online advertising business. The malicious intent of clicking online ads either committed by humans or by non-humans, forced financial losses on advertisers utilizing pay-per-click advertising. Non-human traffic is usually designed to inflate web traffic for fraudulent purposes. In this paper, we demonstrate a hybrid approach consisting of two-level fingerprint applied in two phases to detect illegitimate non-human traffic. The first-level fingerprint is a pattern generated using immutable information about a user navigating a website’s pages. It will be used in the first traffic illegitimacy detection phase to infer rules about illegitimate non-human traffic from a developed ontology about web traffic legitimacy. The second-level fingerprint is generated using behavioral ad click patterns, which will be used in the second detection phase by applying a Machine-Learning (ML) algorithm. To test the proposed approach, a real commercial website for ads, called Waseet.com, was used. The access logs of the website server were utilized for the purpose of this research. The experiments show that our proposed hybrid approach using the ontology of web traffic illegitimacy and the ML k-NN classifier detects around (98.6%) of fake clicks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Alexopoulos P, Kafentzis K, Benetou X, Tagaris T, Georgolios P (2007) Towards a generic fraud ontology in e-government. In ICE-B 269–276

  2. Alhabash S, Mundel J, Hussain SA (2017) Social media advertising: unraveling the mystery box. Digital Advertising. Routledge, England, pp 285–299

    Chapter  Google Scholar 

  3. Ali MA, Azad MA, Centeno MP, Hao F, van Moorsel A (2019) Consumer-facing technology fraud: Economics, attack methods and potential solutions. Futur Gener Comput Syst 100:408–427. https://doi.org/10.1016/j.future.2019.03.041

    Article  Google Scholar 

  4. Almahmoud S, Hammo B, Al-Shboul B (2019) Exploring non-human traffic in online digital advertisements: analysis and prediction. In: International Conference on Computational Collective Intelligence. Springer, Cham. pp. 663–675. https://doi.org/10.1007/978-3-030-28374-2_57

  5. Alrwais SA, Gerber A, Dunn CW, Spatscheck O, Gupta M, Osterweil E (2012) Dissecting ghost clicks: Ad fraud via misdirected human clicks. In: Proceedings of the 28th Annual Computer Security Applications Conference pp. 21–30. https://doi.org/10.1145/2420950.2420954

  6. Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185. https://doi.org/10.1080/00031305.1992.10475879

    Article  MathSciNet  Google Scholar 

  7. Attigeri G, MM MP, Pai RM, Kulkarni R (2018) Knowledge base ontology building for fraud detection using topic modeling. Procedia Comput Sci 135:369–376

    Article  Google Scholar 

  8. Baarder F, Nutt W (2003) The description logic handbook, chapter 2. Basic description logics. pp 43–95

  9. Baader F, Calvanese D, McGuinness D, Patel-Schneider P, Nardi D (eds) (2003) The description logic handbook: theory, implementation and applications. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  10. Boser BE, Guyon I.M, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory. pp 144–152. https://doi.org/10.1145/130385.130401

  11. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1007/BF00058655

    Article  MATH  Google Scholar 

  12. Buehrer G, Stokes JW, Chellapilla K (2008) A large-scale study of automated web search traffic. In: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. pp 1–8. https://doi.org/10.1145/1451983.1451985

  13. Carvalho RN, Matsumoto S, Laskey KB, Costa PC, Ladeira M, Santos LL (2010) Probabilistic ontology and knowledge fusion for procurement fraud detection in Brazil. Uncertainty reasoning for the semantic web ii. Springer, Berlin, Heidelberg, pp 19–40

    Google Scholar 

  14. Chakraborty M, Pal S, Pramanik R, Chowdary CR (2016) Recent developments in social spam detection and combating techniques: a survey. Inf Process Manag 52(6):1053–1073. https://doi.org/10.1016/j.ipm.2016.04.009

    Article  Google Scholar 

  15. Chen Y, Kintis P, Antonakakis M, Nadji Y, Dagon D, Farrell M (2017) Measuring lower bounds of the financial abuse to online advertisers: a four year case study of the TDSS/TDL4 Botnet. Comput Secur 67:164–180. https://doi.org/10.1016/j.cose.2017.02.010

    Article  Google Scholar 

  16. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  17. Digital Ad Primer [online]. Available: https://www.martechadvisor.com/articles/display-and-native-advertising/digital-advertising-primer-martech-101/, Accessed from 10 Mar 2020

  18. Dong F, Wang H, Li L, Guo Y, Bissyandé TF, Liu T, Xu G, Klein J (2018) Frauddroid: automated ad fraud detection for android apps. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp 257–268. https://doi.org/10.1145/3236024.3236045

  19. Drummond N, Horridge M, Knublauch H (2005) Protégé-OWL tutorial. In: 8th International Protégé Conference

  20. El Orche A, Bahaj M (2019) Approach to use ontology based on electronic payment system and machine learning to prevent Fraud. In: Proceedings of the 2nd International Conference on Networking, Information Systems & Security. Rabat, Morocco, pp 1–6. https://doi.org/10.1145/3320326.3320369

  21. El-Atawy SS, Khalefa ME (2016) Building an ontology-based electronic health record system. In: Proceedings of the 2nd Africa and Middle East Conference on Software Engineering. pp 40–45. https://doi.org/10.1145/2944165.2944172

  22. Fang L, Cai M, Fu H, Dong J (2007) Ontology-based fraud detection. In: International Conference on Computational Science. Springer, Berlin, Heidelberg, pp 1048–1055. https://doi.org/10.1007/978-3-540-72588-6_168

  23. Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory. Springer, Berlin, Heidelberg, pp 23–37. https://doi.org/10.1007/3-540-59119-2_166

  24. Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(5):771–780

    Google Scholar 

  25. Gabryel M (2018) Data analysis algorithm for click fraud recognition. In: International Conference on Information and Software Technologies. Springer, Cham, pp 437–446. https://doi.org/10.1007/978-3-319-99972-2_36

  26. Gabryel M, Przybyszewski K (2019) The dynamically modified BoW algorithm used in assessing clicks in online ads. In: International Conference on Artificial Intelligence and Soft Computing. Springer, Cham, pp 350–360. https://doi.org/10.1007/978-3-030-20915-5_32

  27. Guarino N, Oberle D, Staab S (2009) What is an ontology? Handbook on ontologies. Springer, Berlin, Heidelberg, pp 1–17

    Google Scholar 

  28. Gupta N, Le HA, Boldina M, Woo J (2019) Predicting fraud of AD click using Traditional and Spark ML. In: The 14th Asia Pacific International Conference on Information Science and Technology (APIC-IST). Beijing, China, pp 24–28

  29. Haider CMR, Iqbal A, Rahman AH, Rahman MS (2018) An ensemble learning based approach for impression fraud detection in mobile advertising. J Netw Comput Appl 112:126–141. https://doi.org/10.1016/j.jnca.2018.02.021

    Article  Google Scholar 

  30. Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class adaboost. Stat Interface 2(3):349–360. https://doi.org/10.4310/SII.2009.v2.n3.a8

    Article  MathSciNet  MATH  Google Scholar 

  31. Hlomani H, Stacey D (2014) Approaches, methods, metrics, measures, and subjectivity in ontology evaluation: a survey. Semant Web J 1(5):1–11

    Google Scholar 

  32. Imperva Incapsula. Bot Traffic Report 2016. (2017) [online] Available: https://www.incapsula.com/blog/bot-traffic-report-2016.html, Accessed from 20 May 2020

  33. Iqbal MS, Zulkernine M, Jaafar F, Gu Y (2016) Fcfraud: fighting click-fraud from the user side. In: 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE). IEEE. pp 157–164. https://doi.org/10.1109/HASE.2016.17

  34. Kampichler C, Wieland R, Calmé S, Weissenberger H, Arriaga-Weiss S (2010) Classification in conservation biology: a comparison of five machine-learning methods. Eco Inform 5(6):441–450. https://doi.org/10.1016/j.ecoinf.2010.06.003

    Article  Google Scholar 

  35. Kaur R, Singh S, Kumar H (2018) Rise of spam and compromised accounts in online social networks: a state-of-the-art review of different combating approaches. J Netw Comput Appl 112:53–88. https://doi.org/10.1016/j.jnca.2018.03.015

    Article  Google Scholar 

  36. Kerremans K, Tang Y, Temmerman R, Zhao G (2005) Towards ontology-based e-mail fraud detection. In: 2005 portuguese conference on artificial intelligence. IEEE, pp 106–111

  37. Kheir N (2012) Analyzing http user agent anomalies for malware detection. Data privacy management and autonomous spontaneous security. Springer, Berlin, Heidelberg, pp 187–200

    Google Scholar 

  38. La VH, Fuentes R, Cavalli AR (2016) Network monitoring using mmt: an application based on the user-agent field in http headers. In: 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA). IEEE, pp 147–154. https://doi.org/10.1109/AINA.2016.41

  39. Minastireanu EA, Mesnita G (2019) Light gbm machine learning algorithm to online click fraud detection. J Inf Assur Cybersecur. https://doi.org/10.5171/2019.263928.

    Article  Google Scholar 

  40. Mladenow A, Novak NM, Strauss C (2015) Online ad-fraud in search engine advertising campaigns. In: Information and Communication Technology-EurAsia Conference. Springer, Cham, pp 109–118. https://doi.org/10.1007/978-3-319-24315-3_11

  41. Mungamuru B, Weis S (2008).Competition and fraud in online advertising markets. In: International Conference on Financial Cryptography and Data Security. Springer, Berlin, Heidelberg, pp 187–191. https://doi.org/10.1007/978-3-540-85230-8_16

  42. Nagaraja S, Shah R (2019) Clicktok: click fraud detection using traffic analysis. In: Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks. pp 105–116. https://doi.org/10.1145/3317549.3323407

  43. Obeid M, Obeid Z, Moubaiddin A, Obeid N (2019) Using description logic and abox abduction to capture medical diagnosis. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer, Cham, pp 376–388. https://doi.org/10.1007/978-3-030-22999-3_33

  44. Papadopoulos P, Azurmendi IQ, Zhang J, Varvello M, Nappa A, Livshits B (2019) ZKSENSE: a privacy-preserving mechanism for bot detection in mobile devices. arXiv preprint arXiv:1911.07649

  45. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106. https://doi.org/10.1007/BF00116251

    Article  Google Scholar 

  46. Ramaki AA, Asgari R, Atani RE (2012) Credit card fraud detection based on ontology graph. International Journal of Security, Privacy and Trust Management (IJSPTM) 1(5):1–12

    Article  Google Scholar 

  47. Segal MR (2004) Machine learning benchmarks and random forest regression. UCSF: Center for Bioinformatics and Molecular Biostatistics [online], Available: https://escholarship.org/uc/item/35x3v9t4, Accessed from 20 May 2020

  48. Singh M, Singh M, Kaur S (2019) Detecting bot-infected machines using DNS fingerprinting. Digit Investig 28:14–33

    Article  Google Scholar 

  49. Steinbach M, Tan PN (2009) kNN: k-nearest neighbors. The top ten algorithms in data mining. Chapman and Hall/CRC, Florida, pp 165–176

    Google Scholar 

  50. Stenberg D (2018) Everything cUR. [Online]. Available: https://ec.haxx.se/, Accessed from 20 May 2020

  51. Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300. https://doi.org/10.1023/A:1018628609742

    Article  Google Scholar 

  52. Tang XB, Liu GC, Yang J, Wei W (2018) Knowledge-based financial statement fraud detection system: based on an ontology and a decision tree. KO Knowl Organ 45(3):205–219

    Article  Google Scholar 

  53. The protégé development package. https://protege.stanford.edu/, Visisted from 20 May 2020

  54. Thejas GS, Boroojeni KG, Chandna K, Bhatia I, Iyengar SS, Sunitha NR (2019) Deep learning-based model to fight against ad click fraud. In: Proceedings of the 2019 ACM Southeast Conference. pp 176–181. https://doi.org/10.1145/3299815.3314453

  55. Wang AH (2010) Detecting spam bots in online social networking sites: a machine learning approach. In: IFIP Annual Conference on Data and Applications Security and Privac. Springer, Berlin, Heidelberg, pp 335–342. https://doi.org/10.1007/978-3-642-13739-6_25

  56. Waseet Classified Ads. (2019) [online]. Available: http://waseet.net, Accessed from 20 May 2020

  57. Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390

    Article  Google Scholar 

  58. Zarras A, Kapravelos A, Stringhini G, Holz T, Kruegel C, Vigna G (2014) The dark alleys of Madison Avenue: Understanding malicious advertisements. In: Proceedings of the 2014 Conference on Internet Measurement Conference. pp 373–380. https://doi.org/10.1145/2663716.2663719

  59. Zhang M, Meng W, Lee S, Lee B, Xing X (2019) All your clicks belong to me: investigating click interception on the web. In: Proceedings of 28th USENIX Security Symposium. Santa Clara, CA, USA, pp 941–957

  60. Zhang X, Ghorbani AA (2020) An overview of online fake news: characterization, detection, and discussion. Inf Process Manage 57(2):102025. https://doi.org/10.1016/j.ipm.2019.03.004

    Article  Google Scholar 

  61. Zhu X, Tao H, Wu Z, Cao J, Kalish K, Kayne J (2017) Fraud prevention in online digital advertising. Springer International Publishing, NewYork

    Book  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the CEO of Waseet.com Classified Ads for his restricted permission to use the website’s logs and his support whenever required and for the anonymous volunteers for doing the evaluation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bashar Al-Shboul.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Almahmoud, S., Hammo, B., Al-Shboul, B. et al. A hybrid approach for identifying non-human traffic in online digital advertising. Multimed Tools Appl 81, 1685–1718 (2022). https://doi.org/10.1007/s11042-021-11533-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11533-4

Keywords

Navigation