Gradient Boosting-Based Predictive Click Fraud Detection Using Manifold Criterion Variable Elimination

Singh, Lokesh; Sisodia, Deepti; Taranath, N. L.

doi:10.1007/978-3-031-38296-3_22

Lokesh Singh¹⁹,
Deepti Sisodia¹⁹ &
N. L. Taranath¹⁹

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 673))

Included in the following conference series:

International Conference on Computational Intelligence in Data Science

112 Accesses

Abstract

Online advertising models are vulnerable to click fraud, which occurs when an individual or a group repeatedly clicks on an online advertisement with the intent to generate illegitimate clicks and make money from the advertiser. In machine learning-based approaches for detecting click fraud, the performance of the models can be affected by the presence of collinear, redundant, and least significant features in the dataset. These types of features can lead to overfitting, where the model becomes too complex and fails to generalize well to new data. Therefore, a Manifold Criterion Variable Elimination method is proposed in this work to select significant features utilizing the potential of six filter-based feature selection techniques for the discrimination of fraud and genuine publishers. Experimentations are conducted on the online advertisement user click dataset in two modes, first considering all extracted features and second considering only selected features. An extraction of 103 statistical features from the user-click dataset is performed for each class instance labelled with OK, Fraud and Observation. The Manifold Criterion Variable Elimination method selects the top 15 most relevant features. Individual and ensemble learning models are trained with selected feature-set and tuned parameter values. The performances of learners are evaluated using standard evaluation measures. The results demonstrated that, in general, the performance of all learners improved with the selected feature set. Particularly, the Gradient Tree Boosting (GTB) ensemble model performed superiorly by improving the weak learners by minimizing the model's loss via merging the weak learners into a strong one iteratively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhang, L., Guan, Y.: Detecting click fraud in pay-per-click streams of online advertising networks. In: The 28th International Conference on Distributed Computing Systems, 2008. ICDCS’08, pp. 77–84 (2008)
Google Scholar
Sisodia, D., Sisodia, D.S.: Stacked generalization architecture for predicting publisher behaviour from highly imbalanced user-click data set for click fraud detection. New Gener. Comput. (2023b). https://doi.org/10.1007/s00354-023-00218-1
Sisodia, D., Sisodia, D.S., Singh, D.: Evaluating feature importance to investigate publishers conduct for detecting click fraud. In: In: Sisodia, D.S., Garg, L., Pachori, R.B., Tanveer, M. (eds.) Machine Intelligence Techniques for Data Analysis and Signal Processing. LNEE, vol. 997, pp. 515–524. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-0085-5_42
Singh, L., Sisodia, D., Shashvat, K., Kaur, A., Sharma, P.C.: A reliable click-fraud detection system for the investigation of fraudulent publishers in online advertising. Appl. Intell. Hum. Comput. Interact. 221–254 (2023)
Google Scholar
Sisodia, D., Sisodia, D.S.: A transfer learning framework towards identifying behavioral changes of fraudulent publishers in pay-per-click model of online advertising for click fraud detection. Expert Syst. Appl. 120922 (2023)
Google Scholar
Berrar, D.: Random forests for the detection of click fraud in online mobile advertising. In: Proceedings of 2012 International Workshop on Fraud Detection in Mobile Advertising (FDMA), Singapore, pp. 1–10 (2012). http://berrar.com/resources/Berrar_FDMA2012.pdf
Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585 (2018). https://doi.org/10.1016/j.procs.2018.05.122
Article Google Scholar
Perera, K.S., Neupane, B., Faisal, M.A., Aung, Z., Woon, W.L.: A novel ensemble learning-based approach for click fraud detection in mobile advertising. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS (LNAI), vol. 8284, pp. 370–382. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03844-5_38
Chapter Google Scholar
Richard Oentaryo, W.L.W., et al.: Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res. 15(1), 99–140 (2014). https://doi.org/10.1145/2623330.2623718
Article MathSciNet Google Scholar
Vasumati, D., Vani, M.S., Bhramaramba, R., Babu, O.Y.: Data mining approach to filter click-spam in mobile ad networks. In: International Conference on Computer Science, Data Mining & Mechanical Engineering, pp. 90–94 (2015)
Google Scholar
Berrar, D.: Learning from automatically labeled data: case study on click fraud prediction. Knowl. Inf. Syst. 46(2), 477–490 (2015). https://doi.org/10.1007/s10115-015-0827-6
Article Google Scholar
Taneja, M., Garg, K., Purwar, A., Sharma, S.: Prediction of click frauds in mobile advertising. In: International Conference on Contemporary Computing, IC3, Noida, India, pp. 162–166 (2015). https://doi.org/10.1109/IC3.2015.7346672
Sisodia, D., Sisodia, D.S.: Gradient boosting learning for fraudulent publisher detection in online advertising. Data Technol. Appl. 55(2), 216–232 (2020). https://doi.org/10.1108/DTA-04-2020-0093
Article Google Scholar
Sisodia, D., Sisodia, D.S.: Data sampling strategies for click fraud detection using imbalanced user click data of online advertising: an empirical review. IETE Tech. Rev. 39(4), 1–10 (2021). https://doi.org/10.1080/02564602.2021.1915892
Article MathSciNet Google Scholar
Sisodia, D., Sisodia, D.S.: Quad division prototype selection-based k-nearest neighbor classifier for click fraud detection from highly skewed user click dataset. Int. J. Eng. Sci. Technol. 28, 1–12 (2022). https://doi.org/10.1016/J.JESTCH.2021.05.015
Article Google Scholar
Sisodia, D., Sisodia, D.S.: A hybrid data-level sampling approach in learning from skewed user-click data for click fraud detection in online advertising. Expert. Syst. 40(July), 1–17 (2022). https://doi.org/10.1111/exsy.13147
Article Google Scholar
Sisodia, D., Sisodia, D.S.: Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising. Data Technol. Appl. 56(4), 1–24 (2022). https://doi.org/10.1108/dta-09-2021-0233
Article MathSciNet Google Scholar
Sisodia, Deepti, Sisodia, Dilip Singh: Data Sampling Methods for Analyzing Publishers Conduct from Highly Imbalanced Dataset in Web Advertising. In: Garg, Lalit, et al. (eds.) Information Systems and Management Science: Conference Proceedings of 4th International Conference on Information Systems and Management Science (ISMS) 2021, pp. 428–441. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-13150-9_34
Chapter Google Scholar
Sisodia, D., Sisodia, D.S.: Feature space transformation of user-clicks and deep transfer learning framework for fraudulent publisher detection in online advertising. Appl. Soft Comput. 125, 109142 (2022). https://doi.org/10.1016/j.asoc.2022.109142
Article Google Scholar
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Article Google Scholar
Hoque, N., Bhattacharyya, D.K., Kalita, J.K.: MIFS-ND: a mutual information-based feature selection method. Expert Syst. Appl. 41(14), 6371–6385 (2014). https://doi.org/10.1016/j.eswa.2014.04.019
Article Google Scholar
Kalapatapu, P., Goli, S., Arthum, P., Malapati, A.: A study on feature selection and classification techniques of Indian music. Procedia Comput. Sci. 58(Euspn), 125–131 (2016). https://doi.org/10.1016/j.procs.2016.09.020
Article Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014). https://doi.org/10.1016/j.compeleceng.2013.11.024
Article Google Scholar
Todeschini, R.: k-nearest neighbour method: the influence of data transformations and metrics. Chemom. Intell. Lab. Syst. 6(3), 213–220 (1989)
Article Google Scholar
Zhang, S.: KNN-CF approach: incorporating certainty factor to kNN classification. IEEE Intell. Inf. Bull. 11(1), 24–33 (2010). http://www.comp.hkbu.edu.hk/~iib/2010/Dec/article4/iib_vol11no1_article4.pdf
Utgoff, P.E.: Incremental induction of decision trees. Mach. Learn. 4(2), 161–186 (1989). https://doi.org/10.1023/A:1022699900025
Article Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986). https://doi.org/10.1023/A:1022643204877
Article Google Scholar
Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165–175 (1989)
Article MathSciNet Google Scholar
Ramayah, T., Ahmad, N.H., Halim, H.A., Rohaida, S., Zainal, M., Lo, M.: Discriminant analysis: an illustrated example. Afr. J. Bus. Manag. 4(9), 1654–1667 (2010)
Google Scholar
Friedman, N., Geiger, D., Goldszmit, M.: Bayesian network classifiers. Mach. Learn. 29, 131–163 (1997). https://doi.org/10.1023/a:1007465528199
Article MATH Google Scholar
Friedman, N., Goldszmidt, M.: Building classifiers using Bayesian networks. In: AAAI-96 Proceedings, pp. 1277–1284 (1996). 10.1.1.30.4898
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018
Article MATH Google Scholar
Vapnik, V.N.: Statistical learning theory. Adapt. Learn. Syst. Signal Process. Commun. Control 2, 1–740 (1998). https://doi.org/10.2307/1271368
Article MATH Google Scholar
Sisodia, D., Shrivastava, S.K., Jain, R.C.: ISVM for face recognition. In: International Conference on Computational Intelligence and Communication Networks, (CICN), pp. 554–559 (2010). https://doi.org/10.1109/CICN.2010.109
Singh, L., Janghel, R.R., Sahu, S.P.: A hybrid feature fusion strategy for early fusion and majority voting for late fusion towards melanocytic skin lesion detection. Int. J. Syst. Technol. 32, 1–20 (2021). https://doi.org/10.1002/ima.22692
Article Google Scholar
Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J.-Japanese Soc. Artif. Intell. 14(771–780), 1612 (1999)
Google Scholar
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(6), 933–969 (2004). https://doi.org/10.1162/1532443041827916
Article MathSciNet MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Fang, Y.K.: LPBoost with strong classifiers. Int. J. Comput. Intell. Syst. 1(2006), 88–100 (2010)
Google Scholar
Bennett, K.P., Shawe-taylor, J.: Linear programming boosting via column generation. Mach. Learn. 46, 225–254 (2002). https://doi.org/10.1023/A:1012470815092
Article MATH Google Scholar
Lemaitre, G., Radojevic, M.: Directed reading: boosting algorithms. Heriot-Watt University, Universitat de Girona, Universite de Bourgogne, pp. 1–13 (2009)
Google Scholar
Warmuth, M.K., Liao, J., Rätsch, G.: Totally corrective boosting algorithms that maximize the margin. In: Proceedings of the 23rd international conference on Machine learning - ICML ’06, no. 1999, pp. 1001–1008 (2006). https://doi.org/10.1145/1143844.1143970
Wong, T.T.: Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recogn. 48(9), 2839–2846 (2015). https://doi.org/10.1016/j.patcog.2015.03.009
Article MATH Google Scholar
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009). https://doi.org/10.1016/j.ipm.2009.03.002
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Alliance College of Engineering and Design, Alliance University, Bangalore, Karnataka, India
Lokesh Singh, Deepti Sisodia & N. L. Taranath

Authors

Lokesh Singh
View author publications
You can also search for this author in PubMed Google Scholar
Deepti Sisodia
View author publications
You can also search for this author in PubMed Google Scholar
N. L. Taranath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lokesh Singh .

Editor information

Editors and Affiliations

Sri Sivasubramaniya Nadar College of Engineering, Chennai, India
Sarath Chandran K R
Sri Sivasubramaniya Nadar College of Engineering, Chennai, India
Sujaudeen N
Sri Sivasubramaniya Nadar College of Engineering, Chennai, India
Beulah A
Sri Sivasubramaniya Nadar College of Engineering, Chennai, India
Shahul Hamead H

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, L., Sisodia, D., Taranath, N.L. (2023). Gradient Boosting-Based Predictive Click Fraud Detection Using Manifold Criterion Variable Elimination. In: Chandran K R, S., N, S., A, B., Hamead H, S. (eds) Computational Intelligence in Data Science. ICCIDS 2023. IFIP Advances in Information and Communication Technology, vol 673. Springer, Cham. https://doi.org/10.1007/978-3-031-38296-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-38296-3_22
Published: 22 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38295-6
Online ISBN: 978-3-031-38296-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)