research-article

Stacked-SVM: A Dynamic SVM Framework for Telephone Fraud Identification from Imbalanced CDRs

Authors:
Qingqing Chang

Beijing University of Technology, Beijing, China

Beijing University of Technology, Beijing, China
View Profile

,
Shaofu Lin

Beijing Institute of Smart City, Beijing University of Technology, Beijing, China

Beijing Institute of Smart City, Beijing University of Technology, Beijing, China
View Profile

,
Xiliang Liu

Beijing Institute of Smart City, Beijing University of Technology, Beijing, China

Beijing Institute of Smart City, Beijing University of Technology, Beijing, China
View Profile

ACAI '19: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial IntelligenceDecember 2019Pages 112–120https://doi.org/10.1145/3377713.3377735

Published:07 February 2020Publication History

ACAI '19: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence

Pages 112–120

ABSTRACT

Recent years witnesses the rampancy of telephone fraud along with the development of modern communication technology. The challenges from telephone fraud identification mainly exist in two aspects: (1) the telephone fraud records are typical imbalanced data due to the characteristic of heterogeneous spatial-temporal distribution, leading to bias towards predicting the majority class; (2) traditional evaluation metrics in imbalanced learning mainly rely on accuracy or precision, neglecting the completeness of telephone fraud identification in real-world implementations.

In response to the limitations of traditional methods, we propose the Stacked-SVM framework based on heterogeneous ensemble learning and support vector machines (SVMs). We first employ both edited nearest neighbors (ENN) and adaptive synthetic sampling (ADASYN) to alleviate the high dimensional curse in imbalanced data resampling; secondly, we propose the optimal linear combination strategy in the iteration of Stacked-SVM and demonstrate its validity with the help of Kullback-Leibler divergence. Finally, we construct the Stacked-SVM framework with respect to the constraints of the loss function in SVM. We further compare the performance under different evaluation metrics (i.e., accuracy, precision, recall, F1-score, and AUC value) with other four traditional telephone fraud identification methods, namely Logistic Regression, Isolation Forest, SVM with random parameter settings, and optimized SVM.

We implement Stacked-SVM with a list of experiments based on real telephone fraud data sets in the form of calling detail records (CDRs) from a Chinese domestic telecom operator. The experimental results show that the proposed Stacked-SVM holds a 93.83% recall value and an 82.96% accuracy in telephone fraud identification, behaving more precise and robust than other models.

References

Communications Fraud Control Association (CFCA). 2017 Global Fraud Loss Surveys, 2017.Google Scholar
Josh Jia-Ching Ying, Ji Zhang, Che-Wei Huang, Kuan-Ta Chen, and Vincent S. Tseng. FrauDetector+: An Incremental Graph-Mining Approach for Efficient Fraudulent Phone Call Detection. ACM Trans. Knowl. Discov. Data, 12(6):1--35, 2018.Google ScholarDigital Library
360 Internet security center. 2016 China telecom fraud situation analysis report. http://zt.360.cn/1101061855.php?dtid=1101061451&did=490024605Google Scholar
D. Ramyachitra, P. Manikandan, Imbalanced dataset classification and solutions: a review.Int. J. Comput. Bus. Res. 5, 2014.Google Scholar
Y. Sun, A.K.C. Wong, M.S. Kamel, Classification of imbalanced data: A review, Int. J. Pattern Recogn. Artif. Intell. 23(4):687--719, 2009.Google ScholarCross Ref
P. Branco, L. Torgo, R.P. Ribeiro. A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2):1--50, 2016.Google ScholarDigital Library
N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res., 16:321--357, 2002.Google ScholarDigital Library
H He, Y Bai, E A Garcia, and S Li. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. In: IEEE International Joint Conference on Neural Networks, 1322--1328, 2008.Google Scholar
D. Wilson. Asymptotic properties of nearest neighbor rules using edited data. Systems, Man and Cybernetics, IEEE Transactions on, 408--421, 1972.Google Scholar
C. Penrod, T. Wagner. Another look at the edited nearest neighbor rule. IEEE Trans. Syst. Man, Cybern. 7:92--94, 1977.Google ScholarCross Ref
J Zhang and I Mani. KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. In: ICML '2003, 2003.Google Scholar
Romero F.A.B. de Morais, Germano C. Vasconcelos. Boosting the performance of over-sampling algorithms through under-sampling the minority class. Neurocomputing, 343:3--18, 2019.Google ScholarDigital Library
J. Gao, B. Ding, W. Fan, J. Han, P.S. Yu, Classifying data streams with skewed class distributions and concept drifts, IEEE Internet Comput. 12:37--49, 2008.Google ScholarDigital Library
M.G. Kelly, D.J. Hand, N.M. Adams. The impact of changing populations on classifier performance. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 367--371, 1999.Google ScholarDigital Library
Richard A. Becker, Chris Volinsky, and Allan R. Wilks. Fraud detection in telecommunications: History and lessons learned. Technimetrics, 52(1):20--33, 2010.Google ScholarCross Ref
D.A. Cieslak, T.R. Hoens, N.V. Chawla, W.P. Kegelmeyer. Hellinger distance decision trees are robust and skew-insensitive. Data Mining Knowl. Discov. 24(1):136--158, 2012.Google ScholarDigital Library
ElahehArabmakki, Mehmed Kantardzic. SOM-based partial labeling of imbalanced data stream. Neurocomputing, 262:120--133, 2017.Google ScholarCross Ref
R.M. Cruz, R. Sabourin, G.D. Cavalcanti. Dynamic classifier selection: Recent advances and perspectives. Inf. Fus.41:195--216, 2018.Google ScholarDigital Library
G. Fung and O.L. Mangasarian. Multicategory Proximal Support Vector Machine Classifiers. Machine Learning, 59:77--97, 2005.Google ScholarDigital Library
Y.H. Liu and Y.T. Chen. Total Margin Based Adaptive Fuzzy Support Vector Machines for Multiview Face Recognition. In: Proc. Int'l Conf. Systems, Man and Cybernetics, 1704--1711, 2005.Google Scholar
Jayadeva, Himanshu Pant, Mayank Sharma, SumitSoman. Twin Neural Networks for the classification of large unbalanced datasets. Neurocomputing, 343:34--49, 2019.Google ScholarDigital Library
H. Sun and M. Guo. Credit risk assessment model of small and medium-sized enterprise based on logistic regression. In: 2015 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 1714--1717, 2015.Google ScholarCross Ref
Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In: ICDM'08, 2008.Google Scholar
Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. Isolation-based anomaly detection. TKDD, 6(1)1--39, 2012.Google ScholarDigital Library
M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera. A review on ensembles for the class imbalance problem: bagging, boosting, and hybrid based approaches. IEEE Trans. Syst. Man, Cybern. C: Appl. Rev, 42:463--484, 2012.Google ScholarDigital Library
R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, T.I. Ren. META-DES: a dynamic ensemble selection framework using meta-learning. Pattern Recognit. 48(5):1925--1935, 2015.Google ScholarDigital Library
Xiliang Liu, Kang Liu, Mingxiao Li, Feng Lu, Mengdi Liao, and Ren Yang. SHE: Stepwise Heterogeneous Ensemble Method for Citywide Traffic Analysis. In: Proceedings of the 1st ACM SIGSPATIAL Workshop on Prediction of Human Mobility (PredictGIS'17). ACM, New York, NY, USA, 2017.Google Scholar
https://www.in.gov/oucc/2418.htm.Google Scholar
https://www.telegraph.co.uk/business/business-reporter/tollring/Google Scholar
C.L. Castro, A.P. Braga. Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. 24 (6):888--899, 2013.Google ScholarCross Ref
SovanSamanta, Madhumangal Pal. Telecommunication System Based on Fuzzy Graphs. J TelecommunSyst Manage, 03(01), 2013.Google Scholar
M. Weatherford. Mining for fraud. IEEE Intelligent Systems 17(4): 4--6, 2002.Google ScholarDigital Library
Dominik Olszewski. A probabilistic approach to fraud detection in telecommunications. Knowledge-Based Systems, 26:246--258, 2012.Google ScholarDigital Library
Somasundaram A, Reddy US. Modelling a stable classifier for handling large scale data with noise and imbalance. In: Computational intelligence in data science (ICCIDS), 1--6, 2017.Google ScholarCross Ref

Index Terms

Stacked-SVM: A Dynamic SVM Framework for Telephone Fraud Identification from Imbalanced CDRs
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Angle-based multicategory distance-weighted SVM

Classification is an important supervised learning technique with numerous applications. We develop an angle-based multicategory distance-weighted support vector machine (MDWSVM) classification method that is motivated from the binary distance-weighted ...
Read More
A new sampling method for classifying imbalanced data based on support vector machine ensemble

The insufficient information from the minority examples cannot exactly represent the inherent structure of the dataset, which leads to a low prediction accuracy of the minority through the existing classification methods. The over- and under-sampling ...
Read More
Application of distributed SVM architectures in classifying forest data cover types

In many 'real-world' applications, a classification of large data sets, which are often also imbalanced, is difficult due to the small, but usually more interesting classes. In this study, a large data set, forest cover type classes, which is actually ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ACAI '19: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence
December 2019
614 pages
ISBN:9781450372619
DOI:10.1145/3377713

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 February 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Adaptive Synthetic Sampling
Calling Detail Record
Edited Nearest Neighbors
Evaluation
Imbalanced data
Support Vector Machine
Telephone fraud
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
ACAI '19 Paper Acceptance Rate97of203submissions,48%Overall Acceptance Rate173of395submissions,44%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 132
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Stacked-SVM: A Dynamic SVM Framework for Telephone Fraud Identification from Imbalanced CDRs

ACAI '19: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Angle-based multicategory distance-weighted SVM

A new sampling method for classifying imbalanced data based on support vector machine ensemble

Application of distributed SVM architectures in classifying forest data cover types

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Stacked-SVM: A Dynamic SVM Framework for Telephone Fraud Identification from Imbalanced CDRs

ACAI '19: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Angle-based multicategory distance-weighted SVM

A new sampling method for classifying imbalanced data based on support vector machine ensemble

Application of distributed SVM architectures in classifying forest data cover types

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media