Skip to main content
Log in

A graph-powered large-scale fraud detection system

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Graph-powered fraud detection is a common issue in various areas, such as e-commerce, banking, insurance and social networks, where data can be naturally formulated as graph structure. Especially in e-commerce, due to its large scale and enormous amount of real-time transactions over millions of merchandises, fraud detection has become an important and serious problem. The challenges lie in three aspects: sparse fraud samples, complex features in online transactions and extra-large scale of e-commerce data. To deal with above issues, in this paper, we propose an efficient graph-powered large-scale fraud detection framework. Concretely, we first present a heterogeneous label propagation algorithm to recall more potentially fraudulent samples for further model training; then, we design a novel multi-view heterogeneous graph neural network model to obtain more accurate fraud predictions; finally, a fraud pattern analysis approach is presented to discover hidden fraud groups. In addition, in order to improve the efficiency and scalability of our proposed fraud detection framework, we present a large-scale fraud detection system deployed on a general graph computing engine. We conduct experiments on two real-world datasets. Results show that the proposed graph-powered fraud detection framework achieves high accuracy and superior scalability on large-scale graph data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://www.taobao.com.

  2. Data Availability Statements: The Mooc data that supports the findings of this study is available from https://snap.stanford.edu/jodie/#datasets. The Taobao data that supports the findings of this study is not openly available due to commercial regularities, but may be partially open upon reasonable request and under the permission of Alibaba in the future.

References

  1. Xu H, Liu D, Wang H, Stavrou A (2015) E-commerce reputation manipulation: the emergence of reputation-escalation-as-a-service. In: Proceedings of the 24th international conference on world wide web, pp 1296–1306

  2. Guo Q, Li Z, An B, Hui P, Huang J, Zhang L, Zhao M (2019) Securing the deep fraud detector in large-scale e-commerce platform via adversarial machine learning approach. In: The world wide web conference, pp 616–626

  3. Wang H, Li Z, Huang J, Hui P, Liu W, Hu T, Chen G (2020) Collaboration based multi-label propagation for fraud detection. In: IJCAI

  4. Weng H, Li Z, Ji S, Chu C, Lu H, Du T, He Q (2018) Online e-commerce fraud: a large-scale detection and analysis. In: 2018 IEEE 34th international conference on data engineering. IEEE, pp 1435–1440

  5. Zhao M, Li Z, An B, Lu H, Yang Y, Chu C (2018) Impression allocation for combating fraud in e-commerce via deep reinforcement learning with action norm penalty. In: IJCAI, pp 3940–3946

  6. Zheng L, Li Z, Li J, Li Z, Gao J (2019) Addgraph: anomaly detection in dynamic graph using attention-based temporal gcn. In: IJCAI, pp 4419–4425

  7. Xu H, Li Z, Chu C, Chen Y, Yang Y, Lu H, Wang H, Stavrou A (2018) Detecting and characterizing web bot traffic in a large e-commerce marketplace. In: European symposium on research in computer security. Springer, pp 143–163

  8. Xing Y, Li Z, Hui P, Huang J, Chen X, Zhang L, Yu G (2020) Link inference via heterogeneous multi-view graph neural networks. In: International conference on database systems for advanced applications. Springer, pp 698–706

  9. Liu Z, Chen C, Yang X, Zhou J, Li X, Song L (2018) Heterogeneous graph neural networks for malicious account detection. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp 2077–2085

  10. Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in neural information processing systems, pp 1024–1034

  11. Zhang Z, Yang H, Bu J, Zhou S, Yu P, Zhang J, Ester M, Wang C (2018) Anrl: attributed network representation learning via deep neural networks. In: IJCAI, vol 18, pp 3155–3161

  12. Weng H, Ji S, Duan F, Li Z, Chen J, He Q, Wang T (2019) Cats: cross-platform e-commerce fraud detection. In: 2019 IEEE 35th international conference on data engineering (ICDE). IEEE, pp 1874–1885

  13. Li Z, Hui P, Zhang P, Huang J, Wang B, Tian L, Zhang J, Gao J, Tang X (2021) What happens behind the scene? towards fraud community detection in e-commerce from online to offline. In: Companion proceedings of the web conference 2021, pp 105–113

  14. Su N, Liu Y, Li Z, Liu Y, Zhang M, Ma S (2018) Detecting crowdturfing” add to favorites” activities in online shopping. In: Proceedings of the 2018 world wide web conference, pp 1673–1682

  15. Li Z, Song J, Hu S, Ruan S, Zhang L, Hu Z, Gao J (2019) Fair: fraud aware impression regulation system in large-scale real-time e-commerce search platform. In: 2019 IEEE 35th international conference on data engineering (ICDE). IEEE, pp 1898–1903

  16. Huang J, Xie Y, Yu F, Ke Q, Abadi M, Gillum E, Mao Z.M (2013) Socialwatch: detection of online service abuse via large-scale social graphs. In: Proceedings of the 8th ACM SIGSAC symposium on information, computer and communications security, pp 143–148

  17. Cao S, Yang X, Chen C, Zhou J, Li X, Qi Y (2019) Titant: online real-time transaction fraud detection in ant financial. Proc VLDB Endowm 12(12):2082–2093

    Article  Google Scholar 

  18. Li X, Liu S, Li Z, Han X, Shi C, Hooi B, Huang H, Cheng X (2020) Flowscope: spotting money laundering based on graphs. In: AAAI, pp 4731–4738

  19. Tan R, Tan Q, Zhang P, Li Z (2021) Graph neural network for ethereum fraud detection. In: 2021 IEEE international conference on big knowledge (ICBK). IEEE, pp 78–85

  20. Mao R, Li Z, Fu J (2015) Fraud transaction recognition: a money flow network approach. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 1871–1874

  21. Peng H, Zhang R, Dou Y, Yang R, Zhang J, Yu PS (2021) Reinforced neighborhood selection guided multi-relational graph neural networks. ACM Trans Inf Syst 40(4):1–46

    Article  Google Scholar 

  22. Oentaryo R, Lim E-P, Finegold M, Lo D, Zhu F, Phua C, Cheu E-Y, Yap G-E, Sim K, Nguyen MN et al (2014) Detecting click fraud in online advertising: a data mining approach. J Mach Learn Res 15(1):99–140

    MathSciNet  Google Scholar 

  23. Tang J, Tian Y, Zhang P, Liu X (2018) Multiview privileged support vector machines. IEEE Trans Neural Netw Learn Syst 29(8):3463–3477

    Article  MathSciNet  Google Scholar 

  24. Carcillo F, Dal Pozzolo A, Le Borgne Y-A, Caelen O, Mazzer Y, Bontempi G (2018) Scarff: a scalable framework for streaming credit card fraud detection with spark. Inf Fus 41:182–194

    Article  Google Scholar 

  25. Ma R, Miao J, Niu L, Zhang P (2019) Transformed l1 regularization for learning sparse deep neural networks. Neural Netw 119:286–298

    Article  Google Scholar 

  26. Gao Y, Yang H, Zhang P, Zhou C, Hu Y (2020) Graph neural architecture search. In: IJCAI, vol 20, pp 1403–1409

  27. Yang H, Chen L, Lei M, Niu L, Zhou C, Zhang P (2020) Discrete embedding for latent networks. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI-20, pp 1223–1229

  28. Wang D, Lin J, Cui P, Jia Q, Wang Z, Fang Y, Yu Q, Zhou J, Yang S, Qi Y (2019) A semi-supervised graph attentive network for financial fraud detection. In: 2019 IEEE international conference on data mining (ICDM). IEEE, pp 598–607

  29. Yao K, Liang J, Liang J, Li M, Cao F (2022) Multi-view graph convolutional networks with attention mechanism. Artif Intell 307:103708

    Article  MathSciNet  Google Scholar 

  30. Song Y, Ye H, Li M, Cao F (2022) Deep multi-graph neural networks with attention fusion for recommendation. Expert Syst Appl 191:116240

    Article  Google Scholar 

  31. Jiang N, Duan F, Chen H, Huang W, Liu X (2022) Mafi: Gnn-based multiple aggregators and feature interactions network for fraud detection over heterogeneous graph. IEEE Trans Big Data 8(4):905–919

    Article  Google Scholar 

  32. Zhao J, Liu X, Yan Q, Li B, Shao M, Peng H (2020) Multi-attributed heterogeneous graph convolutional network for bot detection. Inf Sci 537:380–393

    Article  Google Scholar 

  33. Li Z, Chen X, Song J, Gao J (2022) Adaptive label propagation for group anomaly detection in large-scale networks. IEEE Trans Knowl Data Eng

  34. Liu F, Li Z, Wang B, Wu J, Yang J, Huang J, Zhang Y, Wang W, Xue S, Nepal S et al (2022)eriskcom: an e-commerce risky community detection platform. VLDB J 1–17

  35. Cao Q, Yang X, Yu J, Palow C (2014) Uncovering large groups of active malicious accounts in online social networks. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp 477–488

  36. Tan E, Guo L, Chen S, Zhang X, Zhao Y (2013) Unik: unsupervised social network spam detection. In: Proceedings of the 22nd ACM international conference on information & knowledge management, pp 479–488

  37. Dou Y, Liu Z, Sun L, Deng Y, Peng H, Yu PS (2020) Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 315–324

  38. Liu Z, Dou Y, Yu P.S, Deng Y, Peng H (2020) Alleviating the inconsistency problem of applying graph neural network to fraud detection. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 1569–1572

  39. Hao Y, Zhang F (2021) An unsupervised detection method for shilling attacks based on deep learning and community detection. Soft Comput 25(1):477–494

    Article  Google Scholar 

  40. Zhang G, Li Z, Huang J, Wu J, Zhou C, Yang J, Gao J (2022) efraudcom: an e-commerce fraud detection system via competitive graph neural networks. ACM Trans Inf Syst 40(3):1–29

    Article  Google Scholar 

  41. Ching A, Edunov S, Kabiljo M, Logothetis D, Muthukrishnan S (2015) One trillion edges: graph processing at facebook-scale. Proc VLDB Endowm 8(12):1804–1815

    Article  Google Scholar 

  42. Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp 135–146

  43. Salihoglu S, Widom J (2013) Gps: a graph processing system. In: Proceedings of the 25th international conference on scientific and statistical database management, pp 1–12

  44. Khayyat Z, Awara K, Alonazi A, Jamjoom H, Williams D, Kalnis P (2013) Mizan: a system for dynamic load balancing in large-scale graph processing. In: Proceedings of the 8th ACM European conference on computer systems, pp 169–182

  45. Gonzalez J.E, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: 10th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation (\(\{\)OSDI\(\}\) 12), pp 17–30

  46. Xu J, Li Z, Zeng W, Huang J (2020) Graph computing system and application based on large-scale information network. In: International conference on space information network. Springer, pp 158–178

  47. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd Acm Sigkdd international conference on knowledge discovery and data mining, pp 785–794

  48. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):1–12

    Article  Google Scholar 

  49. Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):1–12

    Article  Google Scholar 

Download references

Acknowledgements

This paper is supported by the China Postdoctoral Science Foundation (2021M692957), and the National Natural Science Foundation of China (62172372).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Biao Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Wang, B., Huang, J. et al. A graph-powered large-scale fraud detection system. Int. J. Mach. Learn. & Cyber. 15, 115–128 (2024). https://doi.org/10.1007/s13042-023-01786-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01786-w

Keywords

Navigation