Skip to main content
Log in

An efficient and scalable approach for mining subgraphs in a single large graph

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In many recent applications, a graph is used to simulate many complex systems, such as social networks, traffic models or bioinformatics, and the underlying graphs for these systems are very large. Algorithms for mining all frequent subgraphs from a single large graph have attracted much attention and been studied in more detail lately. Mining frequent subgraphs is important, and defined as finding all subgraphs whose occurrences in a dataset are greater than or equal to a given frequency threshold. Among frequent subgraph algorithms, GraMi is considered as the state-of-the-art approach. However, GraMi has a huge search space, and therefore still needs a lot of time and memory to process a large graph. In this paper, we propose two effective strategies to balance and reduce the search space of GraMi, which can decrease the number of candidate subgraphs generated, with early pruning of a large portion of the domain for each candidate. Our experiments were performed on four real datasets and the results show that the performance of our balancing GraMi is better than those of the original algorithm GraMi and the optimized version SoGraMi.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. very large data bases, VLDB, vol 1215, pp 487-499

  2. Han J, Pei J (2000) Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explorations Newsl 2(2):14–20

    Article  MathSciNet  Google Scholar 

  3. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87

    Article  MathSciNet  Google Scholar 

  4. Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362

    Article  Google Scholar 

  5. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390

    Article  Google Scholar 

  6. Zaki MJ, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478

    Article  Google Scholar 

  7. Vo B, Hong TP, Le B (2012) DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets. Expert Syst Appl 39(8):7196–7206

    Article  Google Scholar 

  8. Deng ZH (2016) DiffNodesets: An efficient structure for fast mining frequent itemsets. Appl Soft Comput 41:214–223

    Article  Google Scholar 

  9. Bui H, Vo B, Nguyen H, Nguyen-Hoang TA, Hong TP (2018) A weighted N-list-based method for mining frequent weighted itemsets. Expert Syst Appl 96:388–405

    Article  Google Scholar 

  10. Aryabarzan N, Minaei-Bidgoli B, Teshnehlab M (2018) negFIN: An efficient algorithm for fast mining frequent itemsets. Expert Syst Appl 105:129–143

    Article  Google Scholar 

  11. Vo B, Pham S, Le T, Deng ZH (2017) A novel approach for mining maximal frequent patterns. Expert Syst Appl 73:178–186

    Article  Google Scholar 

  12. Le T, Vo B (2015) An N-list-based algorithm for mining frequent closed patterns. Expert Syst Appl 42(19):6648–6657

    Article  Google Scholar 

  13. Nguyen LT, Vu VV, Lam MT, Duong TT, Manh LT, Nguyen TT, Fujita H (2019) An efficient method for mining high utility closed itemsets. Inf Sci 495:78–99

    Article  Google Scholar 

  14. Vo B, Nguyen LV, Vu VV, Lam MT, Duong TT, Manh LT, Hong TP (2020) Mining correlated high utility itemsets in one phase. IEEE Access 8:90465–90477

    Article  Google Scholar 

  15. Nouioua M, Fournier-Viger P, Wu CW, Lin JCW, Gan W (2021) FHUQI-Miner: Fast high utility quantitative itemset mining. Appl Intell: 1–25

  16. Baek Y, Yun U, Kim H, Nam H, Kim H, Lin JCW, Pedrycz W (2021) RHUPS: Mining recent high utility patterns with sliding window–based arrival time control over data streams. ACM Trans Intell Syst Technol (TIST) 12(2):1–27

    Article  Google Scholar 

  17. Gan W, Lin JCW, Zhang J, Fournier-Viger P, Chao HC, Philip SY (2020) Fast utility mining on sequence data. IEEE transactions on cybernetics 51(2):487–500

  18. Tran T, Vo B, Le TTN, Nguyen NT (2017) Text clustering using frequent weighted utility itemsets. Cybern Syst 48(3):193–209

    Article  Google Scholar 

  19. Gan W, Lin JCW, Chao HC, Fujita H, Philip SY (2019) Correlated utility-based pattern mining. Inf Sci 504:470–486

    Article  MathSciNet  MATH  Google Scholar 

  20. Jung JJ (2012) Constraint graph-based frequent pattern updating from temporal databases. Expert Syst Appl 39(3):3169–3173

    Article  Google Scholar 

  21. Elseidy M, Abdelhamid E, Skiadopoulos S, Kalnis P (2014) Grami: Frequent subgraph and pattern mining in a single large graph. Proc VLDB Endow 7(7):517-528

  22. Nguyen LB, Vo B, Le NT, Snasel V, Zelinka I (2020) Fast and scalable algorithms for mining subgraphs in a single large graph. Eng Appl Artif Intell 90:103539

    Article  Google Scholar 

  23. Abdelhamid E, Abdelaziz I, Kalnis P, Khayyat Z, Jamour F (2016) Scalemine: Scalable parallel frequent subgraph mining in a single large graph. In: SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 716-727

  24. Qiao F, Zhang X, Li P, Ding Z, Jia S, Wang H (2018) A parallel approach for frequent subgraph mining in a single large graph using spark. Appl Sci 8(2):230

    Article  Google Scholar 

  25. Le NT, Vo B, Nguyen LB, Fujita H, Le B (2020) Mining weighted subgraphs in a single large graph. Inf Sci 514:149–165

    Article  MathSciNet  MATH  Google Scholar 

  26. Zeng J, Yang LT, Lin M, Ning H, Ma J (2020) A survey: Cyber-physical-social systems and their system-level design methodology. Future Gener Comput Syst 105:1028–1042

    Article  Google Scholar 

  27. Ding RX, Wang X, Shang K, Herrera F (2019) Social network analysis-based conflict relationship investigation and conflict degree-based consensus reaching process for large scale decision making using sparse representation. Inf Fusion 50:251–272

    Article  Google Scholar 

  28. Iqbal R, Doctor F, More B, Mahmud S, Yousuf U (2020) Big data analytics and computational intelligence for cyber–physical systems: recent trends and state of the art applications. Future Gener Comput Syst 105:766–778

    Article  Google Scholar 

  29. Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: 2002 IEEE International Conference on Data Mining. Proc IEEE, pp 721-724

  30. Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM (JACM) 23(1):31–42

    Article  MathSciNet  Google Scholar 

  31. Talukder N, Zaki MJ (2016) A distributed approach for graph mining in massive networks. Data Min Knowl Disc 30(5):1024–1052

    Article  MathSciNet  MATH  Google Scholar 

  32. Zhao X, Chen Y, Xiao C, Ishikawa Y, Tang J (2016) Frequent subgraph mining based on Pregel. Comput J 59(8):1113–1128

    Article  Google Scholar 

  33. Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Disc 11(3):243–271

    Article  MathSciNet  Google Scholar 

  34. Shahrivari S, Jalili S (2015) Distributed discovery of frequent subgraphs of a network using MapReduce. Computing 97(11):1101–1120

    Article  MathSciNet  MATH  Google Scholar 

  35. Li J, Liu Y, Pan J, Zhang P, Chen W, Wang L (2020) Map-balance-reduce: an improved parallel programming model for load balancing of MapReduce. Future Gener Comput Syst 105:993–1001

    Article  Google Scholar 

  36. Bhuiyan MA, Al Hasan M (2014) An iterative MapReduce based frequent subgraph mining algorithm. IEEE Trans Knowl Data Eng 27(3):608–620

    Article  Google Scholar 

  37. Aridhi S, d’Orazio L, Maddouri M, Mephu E (2014) A novel mapreduce-based approach for distributed frequent subgraph mining. Reconnaissance de Formes et Intelligence Artificielle (RFIA)

  38. Dhiman A, Jain SK (2016) Optimizing frequent subgraph mining for single large graph. Procedia Comput Sci 89:378–385

    Article  Google Scholar 

  39. Mrzic A, Meysman P, Bittremieux W, Moris P, Cule B, Goethals B, Laukens K (2018) Grasping frequent subgraph mining for bioinformatics applications. BioData Min 11(1):20

    Article  Google Scholar 

  40. Nabti CE (2017) Subgraph Isomorphism Search in Massive Graph Data. Doctoral dissertation, University of de Lyon

  41. Jia Y, Zhang J, Huan J (2011) An efficient graph-mining method for complicated and noisy data with real-world applications. Knowl Inf Syst 28(2):423–447

    Article  Google Scholar 

  42. Acosta-Mendoza N, Gago-Alonso A, Medina-Pagola JE (2012) Frequent approximate subgraphs as features for graph-based image classification. Knowl Based Syst 27:381–392

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Institute for Computational Science and Technology (ICST) – Ho Chi Minh City and the Department of Science and Technology (DOST) – Ho Chi Minh City under grant no. 23/2021/HĐ-QKHCN.

We are especially thankful to Mohammed Elseidy, who provided the GraMi source code and two datasets, MiCo and CiteSeer.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bay Vo.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, L.B.Q., Nguyen, L.T.T., Vo, B. et al. An efficient and scalable approach for mining subgraphs in a single large graph. Appl Intell 52, 17881–17895 (2022). https://doi.org/10.1007/s10489-022-03164-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03164-5

Keywords

Navigation