Skip to main content
Log in

Fraud detection in the distributed graph database

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Over the last few decades, graphs have become increasingly important in many applications and domains for managing Big data. Big data analysis in a graph database is described as an analysis of exponentially increasing massive interconnected data concerning time. However, analyzing big connected data in social networks and synthetic identity detection is challenging. In previous approaches, fraud detection has been done on the complete graph data, which is a time-consuming process and will create bottlenecks while query execution. To overcome the issue, this paper proposes a new fraud detection technique to unveil synthetic identities involved in the Panama Paper leak dataset (unprecedented leak of 11.5 m data from the database of the world’s fourth-biggest offshore law arm, Mossack Fonseca) using a Node rank-based fraud detection algorithm by integrating distributed data profiling techniques on a minimized graph by minimizing the least influential nodes. The proposed model is verified on the three nodes cluster to improve data scalability, reduce the query execution time by an average of 30–36% and finally reduce the fraud detection time by 18.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

Not applicable.

Code availability

Not applicable.

References

  1. Basak, A., Li, S., Hu, X., Oh, S. M., Xie, X., Zhao, L., ..., Xie, Y.: Analysis and optimization of the memory hierarchy for graph processing workloads. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 373–386. IEEE (2019)

  2. Cattuto, C., Quaggiotto, M., Panisson, A., Averbuch, A.: Time-varying social networks in a graph database: a Neo4j use case. In: First international workshop on graph data management experiences and systems, pp. 1–6 (2013)

  3. Chen, D.B., Gao, H., Lü, L., Zhou, T.: Identifying influential nodes in large-scale directed networks: the role of clustering. PloS one 8(10), (2013)

    Article  Google Scholar 

  4. Drakopoulos, G., Gourgaris, P., Kanavos, A.: Graph communities in Neo4j. Evolving Systems 1–11 (2018)

  5. Elyasi, N., Choi, C., Sivasubramaniam, A.: Large-scale graph processing on emerging storage devices. In: 17th USENIX Conference on File and Storage Technologies (FAST 19), pp. 309–316 (2019)

  6. Gomez, L., Kuijpers, B., Vaisman, A.: Performing OLAP over graph data: query language, implementation, and a case study. In: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics, pp. 1–8 (2017)

  7. Gubichev, A., Then, M.: Graph pattern matching: do we have to reinvent the wheel?. In: Proceedings of Workshop on GRAph Data management Experiences and Systems, pp. 1–7 (2014)

  8. Harding, L.: What are the Panama Papers? A guide to history’s biggest data leak. The Guardian 5(04) (2016)

  9. Holzschuher, F., Peinl, R.: Performance of graph query languages: comparison of cypher, gremlin and native access in Neo4j. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, pp. 195–204 (2013)

  10. Huang, S.Y., Lin, C.C., Chiu, A.A., Yen, D.C.: Fraud detection using fraud triangle risk factors. Inf. Syst. Front. 19(6), 1343–1356 (2017)

    Article  Google Scholar 

  11. Junghanns, M., Petermann, A., Neumann, M., Rahm, E.: Management and analysis of big graph data: current systems and open challenges. In: Handbook of Big Data Technologies, pp. 457–505. Springer, Cham (2017)

  12. Liu, Q., Xiang, B., Yuan, N.J., Chen, E., Xiong, H., Zheng, Y., Yang, Y.: An influence propagation view of pagerank. ACM Trans. Knowl. Discov. Data (TKDD) 11(3), 1–30 (2017)

    Google Scholar 

  13. Liu, X., Tian, Y., He, Q., Lee, W.C., McPherson, J.: Distributed graph summarization. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 799–808 (2014)

  14. Maduako, I., Cavalheri, E., Wachowicz, M.: Exploring the use of time-varying graphs for modelling transit networks (2018). arXiv preprint arXiv:1803.07610

  15. Mahfoud, H.: Graph pattern matching with counting quantifiers and label-repetition constraints. Clust. Comput. 23(3), 1529–1553 (2020)

    Article  Google Scholar 

  16. Maiolo, S., Etcheverry, L., Marotta, A.: Data profiling in property graph databases. J. Data Inform. Qual. (JDIQ) 12(4), 1–27 (2020)

    Article  Google Scholar 

  17. Mathew, A.B.: Efficient query retrieval from social data in neo4j using lindex. KSII Trans. Internet Inform. Syst. (TIIS) 12(5), 2211–2232 (2018)

    Google Scholar 

  18. Neo4j Powers the Panama Papers Investigation (2019). https://Neo4j.com/news/Neo4j-powers Panama-papersinvestigation

  19. Obermaier, F., Obermayer, B.: The Panama Papers: Breaking the story of how the rich and powerful hide their money. Simon and Schuster (2017)

  20. O’Donovan, J., Wagner, H.F., Zeume, S.: The value of offshore secrets: Evidence from the Panama Papers. The Review of Financial Studies 32(11), 4117–4155 (2019)

    Article  Google Scholar 

  21. Pourhabibi, T., Ong, K.L., Kam, B.H., Boo, Y.L.: Fraud detection: a systematic literature review of graph-based anomaly detection approaches. Decis. Support Syst. 133, 113303 (2020)

    Article  Google Scholar 

  22. Qiu, L., Zhang, J., Tian, X.: Ranking influential nodes in complex networks based on local and global structures. Appl. Intell. 14, 1–14 (2021)

    Google Scholar 

  23. Roumelis, G., Velentzas, P., Vassilakopoulos, M., Corral, A., Fevgas, A., Manolopoulos, Y.: Parallel processing of spatial batch-queries using xBR+-trees in solid-state drives. Clust. Comput. 23(3), 1555–1575 (2020)

    Article  Google Scholar 

  24. Sarma, D., Alam, W., Saha, I., Alam, M.N., Alam, M.J., Hossain, S.: Bank fraud detection using community detection algorithm. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 642–646. IEEE (2020)

  25. Sarstedt, M., Mooi, E.: A concise guide to market research. Process, Data, and, 12 (2014)

  26. Shakya, S.: IoT based F-RAN architecture using cloud and edge detection system. J. ISMAC 3(01), 31–39 (2021)

    Article  Google Scholar 

  27. Spyropoulos, V., Vasilakopoulou, C., Kotidis, Y.: Digree: A middleware for a graph databases polystore. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2580–2589. IEEE (2016)

  28. Srivastava, S., Singh, A.K.: Graph based analysis of panama papers. In: 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 822–827. IEEE (2018)

  29. Szárnyas, G.: Incremental view maintenance for property graph queries. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1843–1845 (2018)

  30. Tanase, G., Suzumura, T., Lee, J., Chen, C. F., Crawford, J., Kanezashi, H., ..., Vijitbenjaronk, W.D.: System G distributed graph database (2018). arXiv preprint arXiv:1802.03057

  31. The five most important Graphs from the Panama Papers leaks (2019). https://qz.com/654027/the-five-most-important-graphs-from-these-.panama-papers-leaks

  32. van Rest, O., Hong, S., Kim, J., Meng, X., Chafi, H.: PGQL: a property graph query language. In: Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, pp. 1–6 (2016)

  33. Webber, J., Robinson, I.: A Programmatic Introduction to Neo4j. Addison-Wesley Professional, Boston (2018)

    Google Scholar 

  34. What is the Secret Behind the Panama Papers? (2019). https://datafloq.com/read/panama-papers-its-all-about-the-data/2072

  35. Yelmewad, P., Talawar, B.: Parallel deterministic local search heuristic for minimum latency problem. Clust. Comput. 24(2), 969–995 (2021)

    Article  Google Scholar 

  36. Zhu, J., Tirumala, S.S., Babu, G.A.: A technical evaluation of Neo4j and elasticsearch for mining twitter data. In: International Conference on Advances in Computing and Data Sciences, pp. 359-369. Springer, Singapore (2018)

Download references

Funding

The authors would like to thank the Technical Education Quality Improvement Programme (TEQIP-III), to support the research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sakshi Srivastava.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Srivastava, S., Singh, A.K. Fraud detection in the distributed graph database. Cluster Comput 26, 515–537 (2023). https://doi.org/10.1007/s10586-022-03540-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-022-03540-3

Keywords

Navigation