skip to main content
10.1145/3377713.3377735acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacaiConference Proceedingsconference-collections
research-article

Stacked-SVM: A Dynamic SVM Framework for Telephone Fraud Identification from Imbalanced CDRs

Authors Info & Claims
Published:07 February 2020Publication History

ABSTRACT

Recent years witnesses the rampancy of telephone fraud along with the development of modern communication technology. The challenges from telephone fraud identification mainly exist in two aspects: (1) the telephone fraud records are typical imbalanced data due to the characteristic of heterogeneous spatial-temporal distribution, leading to bias towards predicting the majority class; (2) traditional evaluation metrics in imbalanced learning mainly rely on accuracy or precision, neglecting the completeness of telephone fraud identification in real-world implementations.

In response to the limitations of traditional methods, we propose the Stacked-SVM framework based on heterogeneous ensemble learning and support vector machines (SVMs). We first employ both edited nearest neighbors (ENN) and adaptive synthetic sampling (ADASYN) to alleviate the high dimensional curse in imbalanced data resampling; secondly, we propose the optimal linear combination strategy in the iteration of Stacked-SVM and demonstrate its validity with the help of Kullback-Leibler divergence. Finally, we construct the Stacked-SVM framework with respect to the constraints of the loss function in SVM. We further compare the performance under different evaluation metrics (i.e., accuracy, precision, recall, F1-score, and AUC value) with other four traditional telephone fraud identification methods, namely Logistic Regression, Isolation Forest, SVM with random parameter settings, and optimized SVM.

We implement Stacked-SVM with a list of experiments based on real telephone fraud data sets in the form of calling detail records (CDRs) from a Chinese domestic telecom operator. The experimental results show that the proposed Stacked-SVM holds a 93.83% recall value and an 82.96% accuracy in telephone fraud identification, behaving more precise and robust than other models.

References

  1. Communications Fraud Control Association (CFCA). 2017 Global Fraud Loss Surveys, 2017.Google ScholarGoogle Scholar
  2. Josh Jia-Ching Ying, Ji Zhang, Che-Wei Huang, Kuan-Ta Chen, and Vincent S. Tseng. FrauDetector+: An Incremental Graph-Mining Approach for Efficient Fraudulent Phone Call Detection. ACM Trans. Knowl. Discov. Data, 12(6):1--35, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 360 Internet security center. 2016 China telecom fraud situation analysis report. http://zt.360.cn/1101061855.php?dtid=1101061451&did=490024605Google ScholarGoogle Scholar
  4. D. Ramyachitra, P. Manikandan, Imbalanced dataset classification and solutions: a review.Int. J. Comput. Bus. Res. 5, 2014.Google ScholarGoogle Scholar
  5. Y. Sun, A.K.C. Wong, M.S. Kamel, Classification of imbalanced data: A review, Int. J. Pattern Recogn. Artif. Intell. 23(4):687--719, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  6. P. Branco, L. Torgo, R.P. Ribeiro. A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2):1--50, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res., 16:321--357, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H He, Y Bai, E A Garcia, and S Li. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. In: IEEE International Joint Conference on Neural Networks, 1322--1328, 2008.Google ScholarGoogle Scholar
  9. D. Wilson. Asymptotic properties of nearest neighbor rules using edited data. Systems, Man and Cybernetics, IEEE Transactions on, 408--421, 1972.Google ScholarGoogle Scholar
  10. C. Penrod, T. Wagner. Another look at the edited nearest neighbor rule. IEEE Trans. Syst. Man, Cybern. 7:92--94, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  11. J Zhang and I Mani. KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. In: ICML '2003, 2003.Google ScholarGoogle Scholar
  12. Romero F.A.B. de Morais, Germano C. Vasconcelos. Boosting the performance of over-sampling algorithms through under-sampling the minority class. Neurocomputing, 343:3--18, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Gao, B. Ding, W. Fan, J. Han, P.S. Yu, Classifying data streams with skewed class distributions and concept drifts, IEEE Internet Comput. 12:37--49, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M.G. Kelly, D.J. Hand, N.M. Adams. The impact of changing populations on classifier performance. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 367--371, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Richard A. Becker, Chris Volinsky, and Allan R. Wilks. Fraud detection in telecommunications: History and lessons learned. Technimetrics, 52(1):20--33, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  16. D.A. Cieslak, T.R. Hoens, N.V. Chawla, W.P. Kegelmeyer. Hellinger distance decision trees are robust and skew-insensitive. Data Mining Knowl. Discov. 24(1):136--158, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. ElahehArabmakki, Mehmed Kantardzic. SOM-based partial labeling of imbalanced data stream. Neurocomputing, 262:120--133, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  18. R.M. Cruz, R. Sabourin, G.D. Cavalcanti. Dynamic classifier selection: Recent advances and perspectives. Inf. Fus.41:195--216, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Fung and O.L. Mangasarian. Multicategory Proximal Support Vector Machine Classifiers. Machine Learning, 59:77--97, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y.H. Liu and Y.T. Chen. Total Margin Based Adaptive Fuzzy Support Vector Machines for Multiview Face Recognition. In: Proc. Int'l Conf. Systems, Man and Cybernetics, 1704--1711, 2005.Google ScholarGoogle Scholar
  21. Jayadeva, Himanshu Pant, Mayank Sharma, SumitSoman. Twin Neural Networks for the classification of large unbalanced datasets. Neurocomputing, 343:34--49, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Sun and M. Guo. Credit risk assessment model of small and medium-sized enterprise based on logistic regression. In: 2015 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 1714--1717, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  23. Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In: ICDM'08, 2008.Google ScholarGoogle Scholar
  24. Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. Isolation-based anomaly detection. TKDD, 6(1)1--39, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera. A review on ensembles for the class imbalance problem: bagging, boosting, and hybrid based approaches. IEEE Trans. Syst. Man, Cybern. C: Appl. Rev, 42:463--484, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, T.I. Ren. META-DES: a dynamic ensemble selection framework using meta-learning. Pattern Recognit. 48(5):1925--1935, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xiliang Liu, Kang Liu, Mingxiao Li, Feng Lu, Mengdi Liao, and Ren Yang. SHE: Stepwise Heterogeneous Ensemble Method for Citywide Traffic Analysis. In: Proceedings of the 1st ACM SIGSPATIAL Workshop on Prediction of Human Mobility (PredictGIS'17). ACM, New York, NY, USA, 2017.Google ScholarGoogle Scholar
  28. https://www.in.gov/oucc/2418.htm.Google ScholarGoogle Scholar
  29. https://www.telegraph.co.uk/business/business-reporter/tollring/Google ScholarGoogle Scholar
  30. C.L. Castro, A.P. Braga. Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. 24 (6):888--899, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  31. SovanSamanta, Madhumangal Pal. Telecommunication System Based on Fuzzy Graphs. J TelecommunSyst Manage, 03(01), 2013.Google ScholarGoogle Scholar
  32. M. Weatherford. Mining for fraud. IEEE Intelligent Systems 17(4): 4--6, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Dominik Olszewski. A probabilistic approach to fraud detection in telecommunications. Knowledge-Based Systems, 26:246--258, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Somasundaram A, Reddy US. Modelling a stable classifier for handling large scale data with noise and imbalance. In: Computational intelligence in data science (ICCIDS), 1--6, 2017.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Stacked-SVM: A Dynamic SVM Framework for Telephone Fraud Identification from Imbalanced CDRs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ACAI '19: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence
        December 2019
        614 pages
        ISBN:9781450372619
        DOI:10.1145/3377713

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 February 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        ACAI '19 Paper Acceptance Rate97of203submissions,48%Overall Acceptance Rate173of395submissions,44%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader