Skip to main content

Gradient Boosting-Based Predictive Click Fraud Detection Using Manifold Criterion Variable Elimination

  • Conference paper
  • First Online:
Computational Intelligence in Data Science (ICCIDS 2023)

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 673))

Included in the following conference series:

  • 112 Accesses

Abstract

Online advertising models are vulnerable to click fraud, which occurs when an individual or a group repeatedly clicks on an online advertisement with the intent to generate illegitimate clicks and make money from the advertiser. In machine learning-based approaches for detecting click fraud, the performance of the models can be affected by the presence of collinear, redundant, and least significant features in the dataset. These types of features can lead to overfitting, where the model becomes too complex and fails to generalize well to new data. Therefore, a Manifold Criterion Variable Elimination method is proposed in this work to select significant features utilizing the potential of six filter-based feature selection techniques for the discrimination of fraud and genuine publishers. Experimentations are conducted on the online advertisement user click dataset in two modes, first considering all extracted features and second considering only selected features. An extraction of 103 statistical features from the user-click dataset is performed for each class instance labelled with OK, Fraud and Observation. The Manifold Criterion Variable Elimination method selects the top 15 most relevant features. Individual and ensemble learning models are trained with selected feature-set and tuned parameter values. The performances of learners are evaluated using standard evaluation measures. The results demonstrated that, in general, the performance of all learners improved with the selected feature set. Particularly, the Gradient Tree Boosting (GTB) ensemble model performed superiorly by improving the weak learners by minimizing the model's loss via merging the weak learners into a strong one iteratively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhang, L., Guan, Y.: Detecting click fraud in pay-per-click streams of online advertising networks. In: The 28th International Conference on Distributed Computing Systems, 2008. ICDCS’08, pp. 77–84 (2008)

    Google Scholar 

  2. Sisodia, D., Sisodia, D.S.: Stacked generalization architecture for predicting publisher behaviour from highly imbalanced user-click data set for click fraud detection. New Gener. Comput. (2023b). https://doi.org/10.1007/s00354-023-00218-1

  3. Sisodia, D., Sisodia, D.S., Singh, D.: Evaluating feature importance to investigate publishers conduct for detecting click fraud. In: In: Sisodia, D.S., Garg, L., Pachori, R.B., Tanveer, M. (eds.) Machine Intelligence Techniques for Data Analysis and Signal Processing. LNEE, vol. 997, pp. 515–524. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-0085-5_42

  4. Singh, L., Sisodia, D., Shashvat, K., Kaur, A., Sharma, P.C.: A reliable click-fraud detection system for the investigation of fraudulent publishers in online advertising. Appl. Intell. Hum. Comput. Interact. 221–254 (2023)

    Google Scholar 

  5. Sisodia, D., Sisodia, D.S.: A transfer learning framework towards identifying behavioral changes of fraudulent publishers in pay-per-click model of online advertising for click fraud detection. Expert Syst. Appl. 120922 (2023)

    Google Scholar 

  6. Berrar, D.: Random forests for the detection of click fraud in online mobile advertising. In: Proceedings of 2012 International Workshop on Fraud Detection in Mobile Advertising (FDMA), Singapore, pp. 1–10 (2012). http://berrar.com/resources/Berrar_FDMA2012.pdf

  7. Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585 (2018). https://doi.org/10.1016/j.procs.2018.05.122

    Article  Google Scholar 

  8. Perera, K.S., Neupane, B., Faisal, M.A., Aung, Z., Woon, W.L.: A novel ensemble learning-based approach for click fraud detection in mobile advertising. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS (LNAI), vol. 8284, pp. 370–382. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03844-5_38

    Chapter  Google Scholar 

  9. Richard Oentaryo, W.L.W., et al.: Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res. 15(1), 99–140 (2014). https://doi.org/10.1145/2623330.2623718

    Article  MathSciNet  Google Scholar 

  10. Vasumati, D., Vani, M.S., Bhramaramba, R., Babu, O.Y.: Data mining approach to filter click-spam in mobile ad networks. In: International Conference on Computer Science, Data Mining & Mechanical Engineering, pp. 90–94 (2015)

    Google Scholar 

  11. Berrar, D.: Learning from automatically labeled data: case study on click fraud prediction. Knowl. Inf. Syst. 46(2), 477–490 (2015). https://doi.org/10.1007/s10115-015-0827-6

    Article  Google Scholar 

  12. Taneja, M., Garg, K., Purwar, A., Sharma, S.: Prediction of click frauds in mobile advertising. In: International Conference on Contemporary Computing, IC3, Noida, India, pp. 162–166 (2015). https://doi.org/10.1109/IC3.2015.7346672

  13. Sisodia, D., Sisodia, D.S.: Gradient boosting learning for fraudulent publisher detection in online advertising. Data Technol. Appl. 55(2), 216–232 (2020). https://doi.org/10.1108/DTA-04-2020-0093

    Article  Google Scholar 

  14. Sisodia, D., Sisodia, D.S.: Data sampling strategies for click fraud detection using imbalanced user click data of online advertising: an empirical review. IETE Tech. Rev. 39(4), 1–10 (2021). https://doi.org/10.1080/02564602.2021.1915892

    Article  MathSciNet  Google Scholar 

  15. Sisodia, D., Sisodia, D.S.: Quad division prototype selection-based k-nearest neighbor classifier for click fraud detection from highly skewed user click dataset. Int. J. Eng. Sci. Technol. 28, 1–12 (2022). https://doi.org/10.1016/J.JESTCH.2021.05.015

    Article  Google Scholar 

  16. Sisodia, D., Sisodia, D.S.: A hybrid data-level sampling approach in learning from skewed user-click data for click fraud detection in online advertising. Expert. Syst. 40(July), 1–17 (2022). https://doi.org/10.1111/exsy.13147

    Article  Google Scholar 

  17. Sisodia, D., Sisodia, D.S.: Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising. Data Technol. Appl. 56(4), 1–24 (2022). https://doi.org/10.1108/dta-09-2021-0233

    Article  MathSciNet  Google Scholar 

  18. Sisodia, Deepti, Sisodia, Dilip Singh: Data Sampling Methods for Analyzing Publishers Conduct from Highly Imbalanced Dataset in Web Advertising. In: Garg, Lalit, et al. (eds.) Information Systems and Management Science: Conference Proceedings of 4th International Conference on Information Systems and Management Science (ISMS) 2021, pp. 428–441. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-13150-9_34

    Chapter  Google Scholar 

  19. Sisodia, D., Sisodia, D.S.: Feature space transformation of user-clicks and deep transfer learning framework for fraudulent publisher detection in online advertising. Appl. Soft Comput. 125, 109142 (2022). https://doi.org/10.1016/j.asoc.2022.109142

    Article  Google Scholar 

  20. Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Article  Google Scholar 

  21. Hoque, N., Bhattacharyya, D.K., Kalita, J.K.: MIFS-ND: a mutual information-based feature selection method. Expert Syst. Appl. 41(14), 6371–6385 (2014). https://doi.org/10.1016/j.eswa.2014.04.019

    Article  Google Scholar 

  22. Kalapatapu, P., Goli, S., Arthum, P., Malapati, A.: A study on feature selection and classification techniques of Indian music. Procedia Comput. Sci. 58(Euspn), 125–131 (2016). https://doi.org/10.1016/j.procs.2016.09.020

    Article  Google Scholar 

  23. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014). https://doi.org/10.1016/j.compeleceng.2013.11.024

    Article  Google Scholar 

  24. Todeschini, R.: k-nearest neighbour method: the influence of data transformations and metrics. Chemom. Intell. Lab. Syst. 6(3), 213–220 (1989)

    Article  Google Scholar 

  25. Zhang, S.: KNN-CF approach: incorporating certainty factor to kNN classification. IEEE Intell. Inf. Bull. 11(1), 24–33 (2010). http://www.comp.hkbu.edu.hk/~iib/2010/Dec/article4/iib_vol11no1_article4.pdf

  26. Utgoff, P.E.: Incremental induction of decision trees. Mach. Learn. 4(2), 161–186 (1989). https://doi.org/10.1023/A:1022699900025

    Article  Google Scholar 

  27. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986). https://doi.org/10.1023/A:1022643204877

    Article  Google Scholar 

  28. Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165–175 (1989)

    Article  MathSciNet  Google Scholar 

  29. Ramayah, T., Ahmad, N.H., Halim, H.A., Rohaida, S., Zainal, M., Lo, M.: Discriminant analysis: an illustrated example. Afr. J. Bus. Manag. 4(9), 1654–1667 (2010)

    Google Scholar 

  30. Friedman, N., Geiger, D., Goldszmit, M.: Bayesian network classifiers. Mach. Learn. 29, 131–163 (1997). https://doi.org/10.1023/a:1007465528199

    Article  MATH  Google Scholar 

  31. Friedman, N., Goldszmidt, M.: Building classifiers using Bayesian networks. In: AAAI-96 Proceedings, pp. 1277–1284 (1996). 10.1.1.30.4898

    Google Scholar 

  32. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018

    Article  MATH  Google Scholar 

  33. Vapnik, V.N.: Statistical learning theory. Adapt. Learn. Syst. Signal Process. Commun. Control 2, 1–740 (1998). https://doi.org/10.2307/1271368

    Article  MATH  Google Scholar 

  34. Sisodia, D., Shrivastava, S.K., Jain, R.C.: ISVM for face recognition. In: International Conference on Computational Intelligence and Communication Networks, (CICN), pp. 554–559 (2010). https://doi.org/10.1109/CICN.2010.109

  35. Singh, L., Janghel, R.R., Sahu, S.P.: A hybrid feature fusion strategy for early fusion and majority voting for late fusion towards melanocytic skin lesion detection. Int. J. Syst. Technol. 32, 1–20 (2021). https://doi.org/10.1002/ima.22692

    Article  Google Scholar 

  36. Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J.-Japanese Soc. Artif. Intell. 14(771–780), 1612 (1999)

    Google Scholar 

  37. Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(6), 933–969 (2004). https://doi.org/10.1162/1532443041827916

    Article  MathSciNet  MATH  Google Scholar 

  38. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  39. Fang, Y.K.: LPBoost with strong classifiers. Int. J. Comput. Intell. Syst. 1(2006), 88–100 (2010)

    Google Scholar 

  40. Bennett, K.P., Shawe-taylor, J.: Linear programming boosting via column generation. Mach. Learn. 46, 225–254 (2002). https://doi.org/10.1023/A:1012470815092

    Article  MATH  Google Scholar 

  41. Lemaitre, G., Radojevic, M.: Directed reading: boosting algorithms. Heriot-Watt University, Universitat de Girona, Universite de Bourgogne, pp. 1–13 (2009)

    Google Scholar 

  42. Warmuth, M.K., Liao, J., Rätsch, G.: Totally corrective boosting algorithms that maximize the margin. In: Proceedings of the 23rd international conference on Machine learning - ICML ’06, no. 1999, pp. 1001–1008 (2006). https://doi.org/10.1145/1143844.1143970

  43. Wong, T.T.: Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recogn. 48(9), 2839–2846 (2015). https://doi.org/10.1016/j.patcog.2015.03.009

    Article  MATH  Google Scholar 

  44. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009). https://doi.org/10.1016/j.ipm.2009.03.002

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lokesh Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, L., Sisodia, D., Taranath, N.L. (2023). Gradient Boosting-Based Predictive Click Fraud Detection Using Manifold Criterion Variable Elimination. In: Chandran K R, S., N, S., A, B., Hamead H, S. (eds) Computational Intelligence in Data Science. ICCIDS 2023. IFIP Advances in Information and Communication Technology, vol 673. Springer, Cham. https://doi.org/10.1007/978-3-031-38296-3_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-38296-3_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-38295-6

  • Online ISBN: 978-3-031-38296-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics