Elsevier

Neurocomputing

Volume 461, 21 October 2021, Pages 587-597
Neurocomputing

A subgraph-based knowledge reasoning method for collective fraud detection in E-commerce

https://doi.org/10.1016/j.neucom.2021.03.134Get rights and content

Abstract

Fraud detection is essential for e-commerce platforms to maintain a fair business environment. Many existing works propose manually designed methods such as label propagation and dense block mining rules on built user-item graphs to detect fraud behaviours, but they are always heuristic and thus have limited performance. Other learning-based methods can either handle only the fraud detection problem well in the transductive scenario when there is only structural information or require rich content features to obtain a good inductive ability. Considering that content features are not always available in practice and there are usually many fraudulent behaviours that belong to newly emerging users and items, how to learn effective inductive rules with structures only is still underexplored. In this paper, we propose a subgraph-based method named SubGNN for collective fraud detection. In SubGNN, first, we extract the subgraphs around the given edges (user behaviours) to be tested. Then, we remove nodes’ global IDs so that SubGNN is entity-independent. Finally, by learning knowledge reasoning rules on extracted heterogeneous subgraphs using our proposed relational graph isomorphism network (R-GIN), a powerful graph neural network (GNN) model, SubGNN can achieve precise fraud detection. Experiments are conducted on publicly available Amazon and Yelp datasets and a newly collected Taobao dataset. The results clearly show the advantages and prospects of our method. When using SubGNN to detect fraudulent transactions on Taobao, the precision is higher than 0.99 and more than 90% of fraud samples are recalled.

Introduction

To better match users’ interests and items, various data-driven recommendation and ranking algorithms [1], [2] have been designed and deployed in existing e-commerce platforms. Although these algorithms can predict users’ interests better and bring more profits to the platform, the data-driven training mechanism makes them easily poisoned [3], [4]. For example, if a seller controls some user accounts to frequently click a popular item and a target item that belongs to the seller, the collaborative filtering-based recommendation algorithms may be misled that there are some important connections between these two items and begin to recommend the target item to the users who have interactions with the popular item, thus allocating much more impressions [5] to the target item than before.

Fortunately, only if malicious sellers control (or hire) some user accounts and perform a certain amount of fraud behaviours (log data) on real-world e-commerce platforms could the deployed recommendation or ranking algorithms in the system be misled. Considering that the number of accounts that malicious sellers can control is limited and that they always promote several target items together each time, these accounts’ behaviours are usually more collective than those of normal users [6]. This provides us with the opportunity to detect fraud behaviours.

There have been many effective solutions based on label propagation [7] and dense block assumptions [6], [8], [9], [10] for fraud detection in the literature. Similarly, these methods all first build a user-item bipartite graph and then run their algorithms on it. For example, FAP [7] is a representative label propagation-based fraud detection method in which given some prelabelled fraud users and items, the fraud signals can be iteratively propagated to other nodes on the built graphs. Holo [6] relies on the dense block assumption in which they view the behaviours in dense user-item blocks as fraud because fraud users usually perform more collective behaviours than normal users due to the limited cheating accounts and target items.

The methods discussed above have achieved great success on real-world e-commerce platforms. In addition to the insightful designs of these methods, we emphasize that their success is inseparable from their natural inductive property, which ensures that these methods can be used to detect newly emerging users and items directly without any additional change. This is an essential property in the fraud detection area as fraud sellers are always new and may have no relation with existing detected fraud sellers. However, in spite of their good inductive ability, the drawback of these methods is that they are all heuristic and rely on human designs, thus usually having limited performance.

Learning-based techniques for graphs, such as graph representation learning [11], [12], [13], [14] and graph neural networks(GNNs) [15], [16], [17], [18], [19], have gradually become mature in recent years, and some researchers have applied them in fraud detection [20], [21], [22], [23], [24] and achieved much better results than existing heuristic techniques. However, these methods either heavily rely on provided content features that cannot always be available in practice to obtain good performance, or need to train node ID embeddings, thus suffering a serious performance decrease in the inductive scenario because there are usually many behaviours on new users and items with untrained ID embeddings [24].

In this paper, we propose a subgraph-based method for collective fraud detection without losing the inductive property when only structural information is available, which we call SubGNN. Specifically, SubGNN is based on subgraph reasoning [25], [26]. We focus on the behaviour (edges on user-item graphs) classification task. For each candidate user-item edge, first, we extract the subgraph around it. Then, instead of using the common global ID embeddings as the node features, we mark the nodes with new label IDs on the subgraph so that the model can be entity-independent and have a better generalization ability to the edges on new users and items. Finally, we design a powerful relational graph isomorphism network (R-GIN) that has a strong expression ability and can theoretically approximate the upper bound of the Weisfeiler-Lehman (WL) graph isomorphism test method [27] on heterogeneous graphs (bipartite graphs with normal and fraudulent edges in this paper). Benefiting from it, SubGNN can learn complex knowledge reasoning rules on the relabelled heterogeneous subgraphs and perform precise fraud detection. In our experiments, we compare SubGNN with three kinds of representative methods, including label propagation, dense block mining and GNN-based solutions, on publicly available Amazon and Yelp datasets; and the results clearly show the advantages and good prospects of our method. Besides, we test SubGNN on another fraudulent purchase behaviour (transaction) dataset that we collect from a famous e-commerce platform, Taobao. The results show that SubGNN can achieve a very high precision score (0.99+) while recalling more than 90% of fraud examples.

Section snippets

Related works

We review some existing representative heuristic fraud detection methods and discuss recent popular learning-based methods for graphs and their applications in fraud detection in this section.

Zhang et al. [7] propose a label propagation algorithm named FAP, in which they carefully design the propagation rules on the bipartite (user-item) graph, for identifying fraudulent users. Kumar et al. [28] use three metrics, fairness, goodness and reliability, to rank users, items and user-item ratings,

SubGNN: a subgraph-based knowledge reasoning method for collective fraud detection

This paper focuses on fraud behaviour detection (edge classification) on user-item bipartite graphs in e-commerce. Let G(U,V) be a prebuilt user-item graph, where U and V are the user set and item set, respectively. An edge eu,v between a user u and an item v means that u takes an action such as clicking or purchasing on v, and we use E to represent the edge set in G(U,V). n and m are the numbers of users and items, respectively, and we have n=|U| and m=|V|. For each existing edge eu,v with an

Datasets

We conduct experiments on the Amazon1, Yelp2 and Taobao3 datasets. The Amazon dataset contains rich user behaviours and features, including user-item reviews and review helpfulness votes. In addition to its wide use in evaluating the effectiveness of recommendation algorithms [2], [40], it is used on the fraud detection task with the support of the helpfulness votes [41], [28]. Similar to [28],

Conclusion

Fraud detection has become an important task in e-commerce. Considering the common scenario where only structural information is available, existing solutions are either heuristic or designed while ignoring the model generalization to behaviours on newly emerging users and items. To overcome the above problem, in this paper, we propose an inductive subgraph-based method SubGNN with a powerful R-GIN module to model the structural features on heterogeneous graphs. SubGNN has good performance and

CRediT authorship contribution statement

Junshuai Song: Conceptualization, Methodology, Software, Validation, Writing - original draft. Xiaoru Qu: Software, Investigation. Zehong Hu: Methodology, Data curation. Zhao Li: Resources, Writing - review & editing, Funding acquisition. Jun Gao: Resources, Project administration, Writing - review & editing, Funding acquisition. Ji Zhang: Resources, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was partially supported by NSFC under Grant No. 61832001, Alibaba-PKU joint program, and Zhejiang Lab under Grant No. 2019KB0AB06.

Junshuai Song is a Ph.D. Student from Department of Computer Science, School of Electronics Engineering and Computer Science, Peking University. His research interests include recommender systems, reinforcement learning, graph data management and calculation, and big data-driven security. He has published articles in top journals and conferences, including TKDE, ICDE, AAAI and WWW.

References (42)

  • P. Covington et al.

    Deep neural networks for youtube recommendations

  • C. Zhou, J. Bai, J. Song, X. Liu, Z. Zhao, X. Chen, J. Gao, Atrank: An attention-based user behavior modeling framework...
  • G. Yang, N.Z. Gong, Y. Cai, Fake co-visitation injection attacks to recommender systems, NDSS. doi: 10.14722/ndss....
  • J. Song et al.

    Poisonrec: an adaptive data poisoning framework for attacking black-box recommender systems

  • Z. Li et al.

    Fair: Fraud aware impression regulation system in large-scale real-time e-commerce search platform

  • S. Liu et al.

    Holoscope: Topology-and-spike aware fraud detection

  • Y. Zhang, Y. Tan, M. Zhang, Y. Liu, C. Tat-Seng, S. Ma, Catch the black sheep: unified framework for shilling attack...
  • B. Hooi et al.

    Fraudar: Bounding graph fraud in the face of camouflage

  • K. Shin, B. Hooi, C. Faloutsos, M-zoom: Fast dense-block detection in tensors with quality guarantees, in: Joint...
  • K. Shin et al.

    D-cube: Dense-block detection in terabyte-scale tensors

  • Q. Wang et al.

    Knowledge graph embedding: a survey of approaches and applications

    IEEE Trans. Knowl. Data Eng.

    (2017)
  • Y. Lin, X. Han, R. Xie, Z. Liu, M. Sun, Knowledge representation learning: A quantitative review, arXiv preprint...
  • S. Cavallari et al.

    Learning community embedding with community detection and node embedding on graphs

  • S. Cavallari et al.

    Embedding both finite and infinite communities on graphs [application notes]

    IEEE Comput. Intell. Mag.

    (2019)
  • T.N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, arXiv preprint...
  • P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, Graph attention networks, arXiv preprint...
  • W. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs, in: Advances in neural...
  • P.W. Battaglia, J.B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A....
  • K. Xu, W. Hu, J. Leskovec, S. Jegelka, How powerful are graph neural networks?, arXiv preprint...
  • Z. Liu et al.

    Heterogeneous graph neural networks for malicious account detection

  • B. Hu, Z. Zhang, C. Shi, J. Zhou, X. Li, Y. Qi, Cash-out user detection based on attributed heterogeneous information...
  • Cited by (7)

    • Deep Learning Techniques in Financial Fraud Detection

      2022, ACM International Conference Proceeding Series
    • Deep Structure-Aware Approach for QA Over Incomplete Knowledge Bases

      2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus

    Junshuai Song is a Ph.D. Student from Department of Computer Science, School of Electronics Engineering and Computer Science, Peking University. His research interests include recommender systems, reinforcement learning, graph data management and calculation, and big data-driven security. He has published articles in top journals and conferences, including TKDE, ICDE, AAAI and WWW.

    Xiaoru Qu is a Ph.D. Student at Department of Computer Science, Peking University, China. Her research interests consist of graph data mining, review classification in e-commerce, and text processing. She has published papers in top conferences including CIKM and WWW.

    Zehong Hu received the Ph.D. degree from Nanyang Technological University, in 2019. He is currently an Algorithm Expert in Alibaba Group. His current research interests include artificial intelligence, multiagent system, and reinforcement learning. Dr. Hu have published several papers in top-tier conferences including NIPS, AAAI, IJCAI and AAMAS.

    Zhao Li received the Ph.D. degree (Hons.) from the Computer Science Department, University of Vermont. He is currently a Senior Staff Scientist with the Alibaba Group, specializing in e-commerce ranking and recommender systems. He has published more than 50 articles in prestigious conferences and journals, including NeurIPS, AAAI, IJCAI, and DMKD. His current research interests include adversarial machine learning, network representation learning, knowledge graphs, multi-agent reinforcement learning, and big data-driven security. He is also a Technical Committee Member of the China Computer Federation on Database.

    Jun Gao received his B.E. and M.E. in Computer Science, from Shandong University, China, in 1997, 2000, and received his Ph.D. degree in Computer Science, from Peking University in 2003. Currently he is a Professor in the School of Electronics Engineering and Computer Science, Peking University, China. His major research interests include web data management and graph data management.

    Professor Ji Zhang is currently an Associate Professor in computer science at the University of Southern Queensland (USQ), Australia. He is an IEEE senior member, ACM member, Australian Endeavour Fellow, Queensland Fellow (Australia) and Izaak Walton Killam Scholar (Canada). His research interests are Big data analytics, knowledge discovery and data mining (KDD), information privacy and security. He received his degree of Ph.D. from the Faculty of Computer Science at Dalhousie University, Canada in 2008. He has published over 160 papers in major peer-reviewed international journals and conferences.

    View full text