Fraud detection for job placement using hierarchical clusters-based deep neural networks

Kim, Jeongrae; Kim, Han-Joon; Kim, Hyoungrae

doi:10.1007/s10489-019-01419-2

Fraud detection for job placement using hierarchical clusters-based deep neural networks

Published: 08 February 2019

Volume 49, pages 2842–2861, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

1387 Accesses
8 Altmetric
Explore all metrics

Abstract

Fraud detection is becoming an integral part of business intelligence, as detecting fraud in the work processes of a company is of great value. Fraud is an inhibitory factor to accurate appraisal in the evaluation of an enterprise, and it is economically a loss factor to business. Previous studies for fraud detection have limited the performance enhancement because they have learned the fraud pattern of the whole data. This paper proposes a novel method using hierarchical clusters based on deep neural networks in order to detect more detailed frauds, as well as frauds of whole data in the work processes of job placement. The proposed method, Hierarchical Clusters-based Deep Neural Networks (HC-DNN) utilizes anomaly characteristics of hierarchical clusters pre-trained through an autoencoder as the initial weights of deep neural networks to detect various frauds. HC-DNN has the advantage of improving the performance and providing the explanation about the relationship of fraud types. As a result of evaluating the performance of fraud detection by cross validation, the results of the proposed method show higher performance than those of conventional methods. And from the viewpoint of explainable deep learning the hierarchical cluster structure constructed through HC-DNN can represent the relationship of fraud types.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enterprise Financial Fraud Detection and Audit Optimization Based on Deep Learning

Oversampled Deep Fully Connected Neural Network Towards Improving Classifier Performance for Fraud Detection

Financial Fraud Detection with Improved Neural Arithmetic Logic Units

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Andrews MJ, Bradley S, Stott D, Upward R (2008) Successful Employer Search? An Empirical analysis of vacancy duration using micro data. Economica 75(299):455–480
Article Google Scholar
Jacobi L, Kluve J (2006) Before and after the Hartz reforms: The performance of active labour market policy in Germany. Institute for the Study of Labor 40(1):45–64
Google Scholar
Perry A (2000) Performance indicators: measure for measure or a comedy of errors?. In: Proceedings of Further Education Development Agency Research Conference, pp 57–76
Singh H, Singh BP (2013) Business Intelligence: Effective machine learning for business administration. International Journal of IT. International Journal of IT, Engineering and Applied Sciences Research (IJIEASR) 2(1):13–19
Google Scholar
Vidros S, Kolias C, Kambourakis G, Akoglu L (2017) Automatic detection of online recruitment frauds: Characteristics, methods, and a public dataset. Future Internet 9(1):6
Article Google Scholar
Jans M, Lybaert N, Vanhoof K (2010) A framework for internal fraud risk reduction at IT integrating business processes: the IFR² framework. Int. J. Digit. Account. Res. 9:1–29
Google Scholar
Schreyer M, Sattarov T, Borth D, Dengel A, Reimer B (2017) Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks. arXiv preprint arXiv:1709.05254 (last visited on 2112 2018)
Bolton RJ, Hand DJ (2002) Statistical fraud detection: A review. Stat. Sci. 17(3):235–255
Article MathSciNet MATH Google Scholar
Nolle T, Luettgen S, Seeliger A, Mühlhäuser M (2018) Analyzing business process anomalies using autoencoders. Mach. Learn. https://doi.org/10.1007/s1099 (last visited on 2112 2018)
Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: A comparative study. Decis. Support. Syst. 50(3):602–613
Article Google Scholar
Benmessahel I, Xie K, Chellal M (2018) A new evolutionary neural networks based on intrusion detection systems using multiverse optimization. Appl. Intell. 48(8):2315–2327
Article Google Scholar
Chakraborty S, Gupta S, Ray A, Mukhopadhyay A (2008) Data-driven fault detection and estimation in thermal pulse combustors. J. Aerosp. Eng. 222(8):1097–1108
Google Scholar
Zaher A, McArthur SDJ, Infield DG, Patel Y (2009) Online wind turbine fault detection through automated SCADA data analysis. Wind Energy 12(6):574–593
Article Google Scholar
Ogbonnaya EA, Ugwu HU, Theophilus-Johnson K (2012) Gas Turbine Engine Anomaly Detection through Computer Simulation Technique of Statistical Correlation. IOSR Journal of Engineering 2(4):544–554
Article Google Scholar
McKeever G (1999) Detecting, Prosecuting and punishing benefit fraud: The Social Security Administration (Fraud). Act 1997. The Modern Law Review 62(2):261–270
Article Google Scholar
Correia I, Fournier F, Skarbovsky I (2015) The uncertain case of credit card fraud detection. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, pp 181–192
Navigli R (2009) Word sense disambiguation: A survey. ACM Comput. Surv. 41(2):1–69
Article Google Scholar
Choi SP (2018) Extraction of protein–protein interactions (PPIs) from the literature by deep convolutional neural networks with various feature embeddings. J. Inf. Sci. 44(1):60–73
Article Google Scholar
Leon F, Floria SA, Bădică C (2017) Evaluating the effect of voting methods on ensemble-based classification. In: Proceedings of 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp 1–6
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: A highly efficient gradient boosting decision tree. In: Proceedings of. Adv. Neural Inf. Proces. Syst.:3146–3154
Biau G, Scornet E (2016) A random forest guided tour. Test 25(2):197–227
Article MathSciNet MATH Google Scholar
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37(3):297–336
Article MATH Google Scholar
Zhang F, Du B, Zhang L (2016) Scene classification via a gradient boosting random convolutional network framework. IEEE Trans. Geosci. Remote Sens. 54(3):1793–1802
Article Google Scholar
Taieb SB, Hyndman RJ (2014) A gradient boosting approach to the Kaggle load forecasting competition. Int. J. Forecast. 30(2):382–394
Article Google Scholar
Razzaghi T, Xanthopoulos P, Şeref O (2017) Constraint relaxation, cost-sensitive learning and bagging for imbalanced classification problems with outliers. Optim. Lett. 11(5):915–928
Article MathSciNet MATH Google Scholar
Belgiu M, Drăguţ L (2016) Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 114:24–31
Article Google Scholar
Kussul N, Lavreniuk M, Skakun S, Shelestov A (2017) Deep learning classification of land cover and crop types using remote sensing data. IEEE Geosci. Remote Sens. Lett. 14(5):778–782
Article Google Scholar
Kou Y, Lu CT, Sirwongwattana S, Huang YP (2004) Survey of fraud detection techniques. In: Proceedings of 2004 IEEE international conference on Networking, sensing and control, pp 749–754
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. Vol 1. MIT Press, Cambridge, pp 482–586
MATH Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the Dimensionality of Data with Neural Networks. Science 313(5786):504–507
Article MathSciNet MATH Google Scholar
Maltarollo VG, Honório KM, da Silva ABF (2013) Applications of artificial neural networks in chemical problems. In: Proceedings of Artificial neural networks-architectures and applications, pp 203–223
Hershey S, Chaudhuri S, Ellis DPW, Gemmeke JF, Jansen A, Moore RC, Plakal M, Platt D, Saurous RA, Seybold B, Slaney M, Weiss RJ, Wilson K (2017) CNN architectures for large-scale audio classification. In: Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 131–135
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (last visited on 2112 2018)
Fu K, Cheng D, Tu Y, Zhang L (2016) Credit card fraud detection using convolutional neural networks. In: Proceedings of International Conference on Neural Information Processing, pp 483–490
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. Pattern Anal. Mach. Intell. 40(4):834–848
Article Google Scholar
Babaee M, Dinh DT, Rigoll G (2018) A deep convolutional neural network for video sequence background subtraction. Pattern Recogn. 76:635–649
Article Google Scholar
Yang HF, Lin K, Chen CS (2018) Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 40(2):437–451
Article Google Scholar
Jiang C, Song J, Liu G, Zheng L, Luan W (2018) Credit Card Fraud Detection: A Novel Approach Using Aggregation Strategy and Feedback Mechanism. IEEE Internet Things J. 5(5):3637–3647
Article Google Scholar
Duman E, Elikucuk I (2013) Solving credit card fraud detection problem by the new metaheuristics migrating birds optimization. In: Proceedings of International Work-Conference on Artificial Neural Networks, pp 62–71
Akhilomen J (2013) Data mining application for cyber credit-card fraud detection system. In: Proceeding of Industrial Conference on Data Mining, pp 218–228
Ki Y, Yoon JW (2017) PD-FDS: Purchase Density based Online Credit Card Fraud Detection System. In: Proceedings of KDD 2017 Workshop on Anomaly Detection in Finance, pp 76–84
Wheeler R, Aitken S (2000) Multiple algorithms for fraud detection. Knowl.-Based Syst. 13(2–3):93–99
Article Google Scholar
Kültür Y, Çağlayan MU (2017) Hybrid approaches for detecting credit card fraud. Expert. Syst. 34(2). https://doi.org/10.1111/exsy.12191 (last visited on 2112 2018)
Xu W, Wang S, Zhang D, Yang B (2011) Random rough subspace based neural network ensemble for insurance fraud detection. In: Proceedings of International Joint Conference on Computational Sciences and Optimization (CSO), pp 1276–1280
Wang Y, Xu W (2018) Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decis. Support. Syst. 105:87–95
Article Google Scholar
Bolton RJ, Hand DJ (2001) Unsupervised profiling methods for fraud detection. In: Proceedings of Credit Scoring and Credit Control VII, pp 235–255
Anandakrishnan A, Kumar S, Statnikov A, Faruquie T, Xu D (2017) Anomaly Detection in Finance: Editors’ Introduction. In: Proceedings of Machine Learning Research, pp 1–7
Jiang F, Chen YM (2015) Outlier detection based on granular computing and rough set theory. Appl. Intell. 42(2):303–322
Article Google Scholar
Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Proceedings of International Conference on Data Warehousing and Knowledge Discovery, pp 170–180
Williams G, Baxter R, He H, Hawkins S, Gu L (2002) A comparative study of RNN for outlier detection in data mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp 709–712
Cozzolino D, Verdoliva L. (2016) Single-image splicing localization through autoencoder-based anomaly detection. In: Proceedings of 2016 IEEE International Workshop on Information Forensics and Security (WIFS), pp 1–6
Agarwal B, Mittal N (2012) Hybrid Approach for Detection of Anomaly Network Traffic using Data Mining Techniques. Procedia Technology 6:996–1003
Article Google Scholar
Andrews JT, Morton EJ, Griffin LD (2016) Detecting anomalous data using auto-encoders. International Journal of Machine Learning and Computing 6(1):1–21
Google Scholar
Zhai S, Cheng Y, Lu W, Zhang Z (2016) Deep structured energy based models for anomaly detection. arXiv preprint arXiv:1605.07717 (last visited on 2112 2018)
Mao W, He J, Li Y, Yan Y (2017) Bearing fault diagnosis with auto-encoder extreme learning machine: A comparative study. J. Mech. Eng. Sci. 231(8):1560–1578
Article Google Scholar
Lin S, Brown DE (2006) An outlier-based data association method for linking criminal incidents. Decis. Support. Syst. 41(3):604–615
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11:625–660
MathSciNet MATH Google Scholar
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Proceedings of Advances in Neural Information Processing Systems, pp 153–160
Hinton GE, Osindero S, Teh YW (2006) A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Gao S, Zhang Y, Jia K, Lu J, Zhang Y (2015) Single sample face recognition via learning deep supervised autoencoders. IEEE Transactions on Information Forensics and Security 10(10):2108–2118
Article Google Scholar
Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 151–161
Pollack JB (1990) Recursive distributed representations. Artif. Intell. 46(1–2):77–105
Article Google Scholar
Voegtlin T, Dominey PF (2005) Linear recursive distributed representations. Neural Netw. 18(7):878–895
Article MATH Google Scholar
Elman JL (1991) Distributed representations, simple recurrent networks, and grammatical structure. Mach. Learn. 7(2-3):195–225
Article Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, pp 2672–2680
Liang D, Krishnan RG, Hoffman MD, Jebara T (2018) Variational Autoencoders for Collaborative Filtering. arXiv preprint arXiv:1802.05814 (last visited on 2112 2018)
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, pp 1096–1103
Deng J, Zhang Z, Marchi E, Schuller B (2013) Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: Proceedings of 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp 511–516
Wu Y, DuBois C, Zheng AX, Ester M (2016) Collaborative denoising auto-encoders for top-n recommender systems. In: Proceedings of the 9th ACM International Conference on Web Search and Data Mining, pp 153–162
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11:3371–3408
MathSciNet MATH Google Scholar
Das K, Schneider J (2007) Detecting anomalous records in categorical datasets. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 220–229
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of 2000 ACM SIGMOD International Conference on Management of Data, pp 93–104
Kim H, Chan P (2008) Learning Implicit User Interest Hierarchy for Context in Personalization. Appl. Intell. 28(2):153–166
Article Google Scholar
Takezawa K (2005) Introduction to nonparametric regression, vol 606. John Wiley & Sons, Hoboken, pp 325–406
Book Google Scholar
Carlsson G, Mémoli F, Ribeiro A, Segarra S (2013) Axiomatic construction of hierarchical clustering in asymmetric networks. In: Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 5219–5223
Bengio Y, Yao L, Alain G, Vincent P (2013) Generalized denoising auto-encoders as generative models. In: Proceedings of Advances in Neural Information Processing Systems, pp 899–907
Salakhutdinov R, Hinton G (2007) Learning a nonlinear embedding by preserving class neighbourhood structure. In: Proceedings of Artificial Intelligence and Statistics, pp 412–419
Shirin G (2017) Autoencoders and anomaly detection with machine learning in fraud analytics. Shirin's palygRound, https://shiring.github.io (last visited on 2112 2018)
Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison Wesley, Boston, pp 485–664
Google Scholar
Kodinariya TM, Makwana PR (2013) Review on determining number of Cluster in K-Means Clustering. Int. J. 1(6):90–95
Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Computational Statistics & Data Analysis 38(4):367–378
Article MathSciNet MATH Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5):1189–1232
Article MathSciNet MATH Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9):1263–1284
Article Google Scholar
Agarwal S, Dugar D, Sengupta S (2010) Ranking Chemical Structures for Drug Discovery: A New Machine Learning Approach. J. Chem. Inf. Model. 50(5):716–731
Article Google Scholar
Rodriguez M, Posse C, Zhang E (2012) Multiple objective optimization in recommender systems. In: Proceedings of the 6th ACM conference on Recommender systems, pp 11–18
Christopher DM, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge, pp 145–169
MATH Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th international conference on artificial intelligence and statistics, pp 249–256
Zeiler MD (2012) ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (last visited on 2112 2018)
Gunn SR (1998) Support vector machines for classification and regression. ISIS Technical Report 14(1):5–16
Google Scholar
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3):37–52
Article Google Scholar
Murtagh F, Pierre L (2014) Ward’s hierarchical agglomerative clustering method: Which algorithms implement Ward’s criterion? J. Classif. 31(3):274–295
Article MathSciNet MATH Google Scholar
Defays D (1977) An efficient algorithm for a complete link method. Comput. J. 20(4):364–366
Article MathSciNet MATH Google Scholar
Sipser M (2006) Introduction to the Theory of Computation. Thomson Course Technology, pp 245–411
Shindler M, Wong A, Meyerson AW (2011) Fast and accurate k-means for large datasets. In: Proceedings of Advances in neural information processing systems, pp 2375–2383
Dhillon IS, Parlett BN (2003) Orthogonal eigenvectors and relative gaps. SIAM Journal on Matrix Analysis and Applications 25(3):858–899
Article MathSciNet MATH Google Scholar
Nguyen TD, Schmidt B, Kwoh CK (2014) SparseHC: a memory-efficient online hierarchical clustering algorithm. Procedia Computer Science 29:8–19
Article Google Scholar
Kim H, Jang C, Yadav DK, Kim MH (2017) The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix. Journal of Cheminformatics 9(1):1–21
Article Google Scholar
Langfelder P, Zhang B, Horvath S (2007) Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R. Bioinformatics 24(5):719–720
Article Google Scholar

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1A02086148), and was also supported by the MSIT (Ministry of Science and ICT), Korea under the ITRC (Information Technology Research Center) support program (IITP-2018-08-01417) supervised by the IITP (Institute for Information & communications Technology Promotion).

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, University of Seoul, Seoul, Republic of Korea
Jeongrae Kim & Han-Joon Kim
KEIS, Eumseong, Republic of Korea
Hyoungrae Kim

Authors

Jeongrae Kim
View author publications
You can also search for this author in PubMed Google Scholar
Han-Joon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hyoungrae Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Han-Joon Kim.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, J., Kim, HJ. & Kim, H. Fraud detection for job placement using hierarchical clusters-based deep neural networks. Appl Intell 49, 2842–2861 (2019). https://doi.org/10.1007/s10489-019-01419-2

Download citation

Published: 08 February 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s10489-019-01419-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fraud detection for job placement using hierarchical clusters-based deep neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enterprise Financial Fraud Detection and Audit Optimization Based on Deep Learning

Oversampled Deep Fully Connected Neural Network Towards Improving Classifier Performance for Fraud Detection

Financial Fraud Detection with Improved Neural Arithmetic Logic Units

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Fraud detection for job placement using hierarchical clusters-based deep neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enterprise Financial Fraud Detection and Audit Optimization Based on Deep Learning

Oversampled Deep Fully Connected Neural Network Towards Improving Classifier Performance for Fraud Detection

Financial Fraud Detection with Improved Neural Arithmetic Logic Units

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation