Abstract
Fraud detection is usually compared to finding a needle in a haystack and remains a challenging task because fraudulent acts are buried in massive amounts of normal behavior and true intentions may be disguised in a single snapshot. Indeed, fraudulent incidents usually take place in consecutive time steps to gain illegal benefits, which provides unique clues for probing fraudulent behavior by considering a complete behavioral sequence rather than detecting fraud from a snapshot of behavior. Additionally, fraudulent behavior may involve different parties, such that the interaction patterns between sources and targets can help distinguish fraudulent acts from normal behavior. Therefore, in this paper, we model the attributed behavioral sequences generated from consecutive behaviors in order to capture the sequential patterns, while those that deviate from the pattern can be detected as fraudulence. Considering the characteristics of the behavioral sequence, we propose a novel model, NHA-LSTM, by augmenting the traditional LSTM with a modified forget gate, where the interval time between consecutive time steps is considered. Furthermore, we design a self-historical attention mechanism to allow for long time dependencies, which can help identify repeated or cyclical appearances. In addition, we propose an enhanced network embedding method, FraudWalk, to construct embeddings for the nodes in the interaction network with regard to higher-order interactions and particular time constraints for revealing potential group fraudulence. The node embeddings, along with the feature vectors, are fed into the model to capture the interactions between sources and targets. To validate the effectiveness of sequential behavior embeddings, we experiment on a real-world telecommunication dataset with prediction and classification tasks based on the learned embeddings. The experimental results show that the learned embeddings can better identify fraudulent behavior. Finally, we visualize the weights of the attention mechanism to provide a rational interpretation of human behavioral patterns.






Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: SIGKDD. ACM, New York, NY, USA, pp 701–710
Phua C, Lee V, Smith K, Gayler R (2010) A comprehensive survey of data mining-based fraud detection research. CoRR, vol. abs/1009.6119
Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 427–438
Yamanishi K, Takeuchi JI, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Discov 8(3):275–300
Hooi B, Shin K, Song HA, Beutel A, Shah N, Faloutsos C (2017) Graph-based fraud detection in the face of camouflage. ACM Trans Knowl Discov Data 11(4):1–26
Robinson WN, Aria A (2018) Sequential fraud detection for prepaid cards using hidden markov model divergence. Expert Syst Appl 91:235–251
Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50(3):602–613
Tsang S, Koh YS, Dobbie G, Alam S (2014) Detecting online auction shilling frauds using supervised learning. Expert Syst Appl 41(6):3027–3040
Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. CoRR, vol. abs/1506.00019
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. CoRR, vol. abs/1409.3215
Li X, Zhao B, Lu X (2017) Mam-rnn: multi-level attention model based rnn for video captioning. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, pp 2208–2214
Zhai S, Chang Kh, Zhang R, Zhang ZM (2016) Deepintent: learning attentions for online advertising with recurrent neural networks. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1295–1304
Cho K, Van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. CoRR, vol. abs/1406.1078
Collins J, Sohl-Dickstein J, Sussillo D (2016) Capacity and trainability in recurrent neural networks. CoRR, vol. abs/1611.09913
Neil D, Pfeiffer M, Liu SC (2016) Phased lstm: Accelerating recurrent network training for long or event-based sequences. CoRR, vol. abs/1610.09513
Liu Q, Wu S, Wang L, Tan T (2016) Predicting the next location: a recurrent model with spatial and temporal contexts. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, pp 194–200
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR, vol. abs/1409.0473
Chorowski J, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. In: Proceedings of the 28th international conference on neural information processing systems, pp 577–585
Chen J, Zhang H, He X, Nie L, Liu W, Chua TS (2017) Attentive collaborative filtering: multimedia recommendation with item- and component-level attention. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pp 335–344
Feng J, Li Y, Zhang C, Sun F, Meng F, Guo A, Jin D (2018) Deepmove: predicting human mobility with attentional recurrent networks. In: Proceedings of the 2018 world wide web conference, pp 1459–1468
Wang Y, Shen H, Liu S, Gao J, Cheng X (2017) Cascade dynamics modeling with attention-based recurrent neural network. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, pp 2985–2991
Cai H, Zheng VW, Chang KC (2017) A comprehensive survey of graph embedding: problems, techniques and applications. CoRR, vol. abs/1709.07604
Kruskal JB, Wish M (1978) Multidimensional scaling. CRC Press, Boca Raton
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Tenenbaum JB, Silva Vd, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: NIPS. MIT Press, Cambridge, MA, USA, pp 585–591
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. CoRR, vol. abs/1301.3781
Grover A, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: SIGKDD. ACM, New York, NY, USA, pp 855–864
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In: WWW. Republic and canton of Geneva, Switzerland: international world wide web conferences steering committee, pp 1067–1077
Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: SIGKDD. ACM, New York, NY, USA, pp 1225–1234
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Guo J, Liu G, Zuo Y, Wu J (Nov 2018) Learning sequential behavior representations for fraud detection. In: 2018 IEEE international conference on data mining (ICDM), pp 127–136
Ma F, Chitta R, Zhou J, You Q, Sun T, Gao J (2017) Dipole: diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1903–1911
Alterovitz G, Ramoni MF (2006) Discovering biological guilds through topological abstraction. In: AMIA annual symposium proceedings, vol 2006, p 1. American medical informatics association
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR, vol. abs/1412.6980
de Boer P-T, Kroese DP, Mannor S, Rubinstein RY (2005) A tutorial on the cross-entropy method. Ann Oper Res 134(1):19–67
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Wu G, Chang EY (2005) Kba: kernel boundary alignment considering imbalanced data distribution. IEEE Trans Knowl Data Eng 17(6):786–795
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Dr. Liu was supported by National Science Foundation of China (NSFC) under Grant 71701007. Dr. Wu was supported by the National Key R&D Program of China (2019YFB2101804) and NSFC under Grants 71725002, 71531001, U1636210, 71490723. Dr. Zuo was supported by NSFC under Grant 71901012 and China Postdoctoral Science Foundation 2018M640045.
Rights and permissions
About this article
Cite this article
Liu, G., Guo, J., Zuo, Y. et al. Fraud detection via behavioral sequence embedding. Knowl Inf Syst 62, 2685–2708 (2020). https://doi.org/10.1007/s10115-019-01433-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01433-3