Fraud detection in the distributed graph database

Srivastava, Sakshi; Singh, Anil Kumar

doi:10.1007/s10586-022-03540-3

Fraud detection in the distributed graph database

Published: 24 January 2022

Volume 26, pages 515–537, (2023)
Cite this article

Cluster Computing Aims and scope Submit manuscript

1203 Accesses
10 Citations
Explore all metrics

Abstract

Over the last few decades, graphs have become increasingly important in many applications and domains for managing Big data. Big data analysis in a graph database is described as an analysis of exponentially increasing massive interconnected data concerning time. However, analyzing big connected data in social networks and synthetic identity detection is challenging. In previous approaches, fraud detection has been done on the complete graph data, which is a time-consuming process and will create bottlenecks while query execution. To overcome the issue, this paper proposes a new fraud detection technique to unveil synthetic identities involved in the Panama Paper leak dataset (unprecedented leak of 11.5 m data from the database of the world’s fourth-biggest offshore law arm, Mossack Fonseca) using a Node rank-based fraud detection algorithm by integrating distributed data profiling techniques on a minimized graph by minimizing the least influential nodes. The proposed model is verified on the three nodes cluster to improve data scalability, reduce the query execution time by an average of 30–36% and finally reduce the fraud detection time by 18.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Anomaly Detection in Graphs as Node Classification

A graph-powered large-scale fraud detection system

Article 14 February 2023

Graph Fraud Detection Based on Accessibility Score Distributions

Data availability

Not applicable.

Code availability

Not applicable.

References

Basak, A., Li, S., Hu, X., Oh, S. M., Xie, X., Zhao, L., ..., Xie, Y.: Analysis and optimization of the memory hierarchy for graph processing workloads. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 373–386. IEEE (2019)
Cattuto, C., Quaggiotto, M., Panisson, A., Averbuch, A.: Time-varying social networks in a graph database: a Neo4j use case. In: First international workshop on graph data management experiences and systems, pp. 1–6 (2013)
Chen, D.B., Gao, H., Lü, L., Zhou, T.: Identifying influential nodes in large-scale directed networks: the role of clustering. PloS one 8(10), (2013)
Article Google Scholar
Drakopoulos, G., Gourgaris, P., Kanavos, A.: Graph communities in Neo4j. Evolving Systems 1–11 (2018)
Elyasi, N., Choi, C., Sivasubramaniam, A.: Large-scale graph processing on emerging storage devices. In: 17th USENIX Conference on File and Storage Technologies (FAST 19), pp. 309–316 (2019)
Gomez, L., Kuijpers, B., Vaisman, A.: Performing OLAP over graph data: query language, implementation, and a case study. In: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics, pp. 1–8 (2017)
Gubichev, A., Then, M.: Graph pattern matching: do we have to reinvent the wheel?. In: Proceedings of Workshop on GRAph Data management Experiences and Systems, pp. 1–7 (2014)
Harding, L.: What are the Panama Papers? A guide to history’s biggest data leak. The Guardian 5(04) (2016)
Holzschuher, F., Peinl, R.: Performance of graph query languages: comparison of cypher, gremlin and native access in Neo4j. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, pp. 195–204 (2013)
Huang, S.Y., Lin, C.C., Chiu, A.A., Yen, D.C.: Fraud detection using fraud triangle risk factors. Inf. Syst. Front. 19(6), 1343–1356 (2017)
Article Google Scholar
Junghanns, M., Petermann, A., Neumann, M., Rahm, E.: Management and analysis of big graph data: current systems and open challenges. In: Handbook of Big Data Technologies, pp. 457–505. Springer, Cham (2017)
Liu, Q., Xiang, B., Yuan, N.J., Chen, E., Xiong, H., Zheng, Y., Yang, Y.: An influence propagation view of pagerank. ACM Trans. Knowl. Discov. Data (TKDD) 11(3), 1–30 (2017)
Google Scholar
Liu, X., Tian, Y., He, Q., Lee, W.C., McPherson, J.: Distributed graph summarization. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 799–808 (2014)
Maduako, I., Cavalheri, E., Wachowicz, M.: Exploring the use of time-varying graphs for modelling transit networks (2018). arXiv preprint arXiv:1803.07610
Mahfoud, H.: Graph pattern matching with counting quantifiers and label-repetition constraints. Clust. Comput. 23(3), 1529–1553 (2020)
Article Google Scholar
Maiolo, S., Etcheverry, L., Marotta, A.: Data profiling in property graph databases. J. Data Inform. Qual. (JDIQ) 12(4), 1–27 (2020)
Article Google Scholar
Mathew, A.B.: Efficient query retrieval from social data in neo4j using lindex. KSII Trans. Internet Inform. Syst. (TIIS) 12(5), 2211–2232 (2018)
Google Scholar
Neo4j Powers the Panama Papers Investigation (2019). https://Neo4j.com/news/Neo4j-powers Panama-papersinvestigation
Obermaier, F., Obermayer, B.: The Panama Papers: Breaking the story of how the rich and powerful hide their money. Simon and Schuster (2017)
O’Donovan, J., Wagner, H.F., Zeume, S.: The value of offshore secrets: Evidence from the Panama Papers. The Review of Financial Studies 32(11), 4117–4155 (2019)
Article Google Scholar
Pourhabibi, T., Ong, K.L., Kam, B.H., Boo, Y.L.: Fraud detection: a systematic literature review of graph-based anomaly detection approaches. Decis. Support Syst. 133, 113303 (2020)
Article Google Scholar
Qiu, L., Zhang, J., Tian, X.: Ranking influential nodes in complex networks based on local and global structures. Appl. Intell. 14, 1–14 (2021)
Google Scholar
Roumelis, G., Velentzas, P., Vassilakopoulos, M., Corral, A., Fevgas, A., Manolopoulos, Y.: Parallel processing of spatial batch-queries using xBR+-trees in solid-state drives. Clust. Comput. 23(3), 1555–1575 (2020)
Article Google Scholar
Sarma, D., Alam, W., Saha, I., Alam, M.N., Alam, M.J., Hossain, S.: Bank fraud detection using community detection algorithm. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 642–646. IEEE (2020)
Sarstedt, M., Mooi, E.: A concise guide to market research. Process, Data, and, 12 (2014)
Shakya, S.: IoT based F-RAN architecture using cloud and edge detection system. J. ISMAC 3(01), 31–39 (2021)
Article Google Scholar
Spyropoulos, V., Vasilakopoulou, C., Kotidis, Y.: Digree: A middleware for a graph databases polystore. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2580–2589. IEEE (2016)
Srivastava, S., Singh, A.K.: Graph based analysis of panama papers. In: 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 822–827. IEEE (2018)
Szárnyas, G.: Incremental view maintenance for property graph queries. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1843–1845 (2018)
Tanase, G., Suzumura, T., Lee, J., Chen, C. F., Crawford, J., Kanezashi, H., ..., Vijitbenjaronk, W.D.: System G distributed graph database (2018). arXiv preprint arXiv:1802.03057
The five most important Graphs from the Panama Papers leaks (2019). https://qz.com/654027/the-five-most-important-graphs-from-these-.panama-papers-leaks
van Rest, O., Hong, S., Kim, J., Meng, X., Chafi, H.: PGQL: a property graph query language. In: Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, pp. 1–6 (2016)
Webber, J., Robinson, I.: A Programmatic Introduction to Neo4j. Addison-Wesley Professional, Boston (2018)
Google Scholar
What is the Secret Behind the Panama Papers? (2019). https://datafloq.com/read/panama-papers-its-all-about-the-data/2072
Yelmewad, P., Talawar, B.: Parallel deterministic local search heuristic for minimum latency problem. Clust. Comput. 24(2), 969–995 (2021)
Article Google Scholar
Zhu, J., Tirumala, S.S., Babu, G.A.: A technical evaluation of Neo4j and elasticsearch for mining twitter data. In: International Conference on Advances in Computing and Data Sciences, pp. 359-369. Springer, Singapore (2018)

Download references

Funding

The authors would like to thank the Technical Education Quality Improvement Programme (TEQIP-III), to support the research.

Author information

Authors and Affiliations

MNNIT Allahabad, Prayagraj, India
Sakshi Srivastava & Anil Kumar Singh

Authors

Sakshi Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sakshi Srivastava.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srivastava, S., Singh, A.K. Fraud detection in the distributed graph database. Cluster Comput 26, 515–537 (2023). https://doi.org/10.1007/s10586-022-03540-3

Download citation

Received: 18 July 2021
Revised: 04 January 2022
Accepted: 05 January 2022
Published: 24 January 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10586-022-03540-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fraud detection in the distributed graph database

Abstract

Access this article

Similar content being viewed by others

On Anomaly Detection in Graphs as Node Classification

A graph-powered large-scale fraud detection system

Graph Fraud Detection Based on Accessibility Score Distributions

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fraud detection in the distributed graph database

Abstract

Access this article

Similar content being viewed by others

On Anomaly Detection in Graphs as Node Classification

A graph-powered large-scale fraud detection system

Graph Fraud Detection Based on Accessibility Score Distributions

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation