Skip to main content
Log in

Graph-based substructure pattern mining with edge-weight

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

To represent complex inter-relationships among entities, weighted graphs are more useful than their unweighted counterparts. In a transactional graph setting, researchers have made several attempts to mine weighted frequent subgraphs from a collection of edge-weighted graphs, which will serve as the representative feature of the underlying graph database and can be further used for analysis. As weighted support of any pattern does not hold downward closure property, a property that is often used in frequent pattern mining to control search space, has made weighted frequent substructure mining a tremendously difficult task. This article proposes an efficient weighted frequent subgraph mining framework called WFSM-MaxPWS for graphs with static edge weights. We introduce a new pruning technique called MaxPWS pruning along with canonical labeling of subgraphs, which helps reduce the search space significantly without compromising completeness. Extending the WFSM-MaxPWS framework, we propose another framework called DewgSpan that is capable of mining graphs with dynamic edge weight. DewgSpan utilizes a summarized edge-weight distribution table to overcome the new challenges of dynamic edge-weight settings. Evaluation results show that WFSM-MaxPWS and DewgSpan are significantly faster than the existing MaxW pruning technique of weighted pattern mining.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 9
Fig. 10
Fig. 11
Algorithm 5
Algorithm 6
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

Data Availability

The data that support the finding of this study is available at https://github.com/cseduashraful/graphdatasets

Notes

  1. https://pubchem.ncbi.nlm.nih.gov/bioassay/167

  2. https://github.com/cseduashraful/graphdatasets

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB’94, Proceedings of 20th international conference on very large data bases, September 12-15, 1994, Santiago de Chile, Chile, pp 487–499 (1994)

  2. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp 1–12. ACM

  3. Han J, Pei J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th international conference on data engineering, pp 215–224

  4. Islam MA, Rafi MR, Azad Aa, Ovi JA (2022) Weighted frequent sequential pattern mining. Appl Intell 52(1):254–281

  5. Nguyen H, Le T, Nguyen M, Fournier-Viger P, Tseng VS, Vo B (2022) Mining frequent weighted utility itemsets in hierarchical quantitative databases. Knowledge-Based Systems 237:107709

    Article  Google Scholar 

  6. Roy KK, Moon MHH, Rahman MM, Ahmed CF, Leung CK (2021) Mining sequential patterns in uncertain databases using hierarchical index structure. In: Advances in knowledge discovery and data mining: 25th Pacific-Asia Conference, PAKDD 2021, Virtual Event, May 11–14, 2021, Proceedings, Part II, Springer, pp 29–41

  7. Leung CKS, Tanbeer SK (2013) PUF-tree: a compact tree structure for frequent pattern mining of uncertain data. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, pp 13–25

  8. Wang J, Liu C, Fu X, Luo X, Li X (2019) A three-phase approach to differentially private crucial patterns mining over data streams. Computers & Security 82:30–48

    Article  Google Scholar 

  9. Tsuda K, Kudo T (2006) Clustering graphs by weighted substructure mining. In: Proceedings of the 23rd international conference on Machine learning, ACM, pp 953–960

  10. Cheng Z, Flouvat F, Selmaoui-Folcher N (2017) Mining recurrent patterns in a dynamic attributed graph. In: Proceedings of 21st Pacific-Asia conference on knowledge discovery and data mining (PAKDD 2017), Part II, pp 631–643

  11. Huang Z, Ye Y, Li X, Liu F, Chen H (2017) Joint weighted nonnegative matrix factorization for mining attributed graphs. In: Proceedings of 21st Pacific-asia conference on knowledge discovery and data mining (PAKDD 2017), Part I, pp 368–380

  12. Khan A, Akcora CG (2022) Graph-based management and mining of blockchain data. In: Proceedings of the 31st ACM international conference on information & knowledge management, pp 5140–5143

  13. Ning B, Sun Y, Tao X, Li G (2021) Differential privacy protection on weighted graph in wireless networks. Ad hoc networks 110:102303

    Article  Google Scholar 

  14. Gu Z, Liu H, Feng S (2022) Diversity-induced consensus and structured graph learning for multi-view clustering. Appl Intell pp 1–15

  15. Li K, Ye W (202) Semi-supervised node classification via graph learning convolutional neural network. Appl Intell pp 1–13

  16. Ju W, Qin Y, Qiao Z, Luo X, Wang Y, Fu Y, Zhang M (2022) Kernel-based substructure exploration for next poi recommendation. In: 2022 IEEE International conference on data mining (ICDM), IEEE, pp 221–230

  17. Zhang Z, Bu J, Ester M, Li Z, Yao C, Yu Z, Wang C (2021) H2MN: Graph similarity learning with hierarchical hypergraph matching networks. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp 2274–2284

  18. Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: 2002 IEEE International conference on data mining, 2002. Proceedings., IEEE, pp 721–724

  19. Nijssen S, Kok JN (2005) The gaston tool for frequent subgraph mining. Electronic Notes in Theoretical Computer Science 127(1):77–87

    Article  Google Scholar 

  20. Nguyen D, Luo W, Nguyen TD, Venkatesh S, Phung D (2018) Learning graph representation via frequent subgraphs. In: Proceedings of the 2018 SIAM International Conference on Data Mining, SIAM, pp 306–314

  21. Alam MT, Ahmed CF, Samiullah M, Leung CK (2021) Discriminating frequent pattern based supervised graph embedding for classification. In: Advances in knowledge discovery and data mining, pp 16–28

  22. Nowozin S, Tsuda K, Uno T, Kudo T, BakIr G (2007) Weighted substructure mining for image analysis. In: 2007 IEEE Conference on computer vision and pattern recognition, IEEE, pp 1–8

  23. Henderson TA, Podgurski A (2018) Behavioral fault localization by sampling suspicious dynamic control flow subgraphs. In: 2018 IEEE 11th International conference on software testing, verification and validation (ICST), IEEE, pp 93–104

  24. Salehi Z, Ghiasi M, Sami A (2012) A miner for malware detection based on API function calls and their arguments. In: Artificial intelligence and signal processing (AISP), 2012 16th CSI International Symposium on, IEEE, pp 563–568

  25. Du Y, Wang J, Li Q (2017) An android malware detection approach using community structures of weighted function call graphs. IEEE Access 5:17478–17486

    Article  Google Scholar 

  26. Lakhotia A, Preda MD, Giacobazzi R (2013) Fast location of similar code fragments using semantic’juice’. In: Proceedings of the 2nd ACM SIGPLAN program protection and reverse engineering workshop, ACM, pp 5

  27. Ahmed CF, Tanbeer SK, Jeong BS, Lee YK, Choi HJ (2012) Single-pass incremental and interactive mining for weighted frequent patterns. Expert Systems with Applications 39(9):7976–7994

    Article  Google Scholar 

  28. Zou Z, Li J, Gao H, Zhang S (2010) Mining frequent subgraph patterns from uncertain graph data. IEEE Transactions on Knowledge and Data Engineering 22(9):1203–1218

    Article  Google Scholar 

  29. Bogdanov P, Mongiovì M, Singh AK (2011) Mining heavy subgraphs in time-evolving networks. In: 2011 IEEE 11th International conference on data mining, IEEE, pp 81–90

  30. Rozenshtein P, Gionis A (2019) Mining temporal networks. In: Proceedings of the 25th ACM SIGKDD International conference on knowledge discovery & data mining, ACM, pp 3225–3226

  31. Petelin B, Kononenko I, Malačič V, Kukar M (2019) Frequent subgraph mining in oceanographic multi-level directed graphs. Int J Geographical Inf Sci 1–24

  32. Gong Y, Jia L (2019) Research on SVM environment performance of parallel computing based on large data set of machine learning. J Supercomput 1–18

  33. Eichinger F, Böhm K, Huber M (2008) Mining edge-weighted call graphs to localise software bugs. In: Joint european conference on machine learning and knowledge discovery in databases, Springer, pp 333–348

  34. Jiang C, Coenen F (2008) Graph-based image classification by weighting scheme. In: International conference on innovative techniques and applications of artificial intelligence, Springer, pp 63–76

  35. Shinoda M, Ozaki T, Ohkawa T (2009) Weighted frequent subgraph mining in weighted graph databases. In: 2009 IEEE International conference on data mining workshops, IEEE, pp 58–63

  36. Ozaki T, Etoh M (2011) Closed and maximal subgraph mining in internally and externally weighted graph databases. In: Proceedings of the 2011 IEEE International conference on advanced information networking and applications (AINA 2011) Workshops, IEEE, pp 626–631

  37. Alam MT, Roy A, Ahmed CF, Islam MA, Leung CK (2023) UGMINE: utility-based graph mining. Applied Intelligence 53(1):49–68

  38. Eichinger F, Huber M, Böhm K (2010) On the usefulness of weight-based constraints in frequent subgraph mining. In: SGAI Conf., Springer, pp 65–78

  39. Jiang C, Coenen F, Zito M (2010) Frequent sub-graph mining on edge weighted graphs. In: International conference on data warehousing and knowledge discovery, Springer, pp 77–88

  40. Jiang C, Coenen F, Zito M (2010) Finding frequent subgraphs in longitudinal social network data using a weighted graph mining approach. Adv Data Mining Appl 405–416

  41. Elsayed A, Coenen F, Jiang C, Garcia-Finana M, Sluming V (2010) Corpus callosum mr image classification. Knowledge-Based Systems 23(4):330–336

    Article  Google Scholar 

  42. Jiang C, Coenen F, Sanderson R, Zito M (2010) Text classification using graph mining-based feature extraction. In: Research and Development in Intelligent Systems XXVI, Springer, pp 21–34

  43. Lee G, Yun U (2012) Mining weighted frequent sub-graphs with weight and support affinities. In: International workshop on multi-disciplinary trends in artificial intelligence, Springer, pp 224–235

  44. Lee G, Yun U, Kim D (2016) A weight-based approach: frequent graph pattern mining with length-decreasing support constraints using weighted smallest valid extension. Advanced Science Letters 22(9):2480–2484

    Article  Google Scholar 

  45. Babu N, John A (2016) A distributed approach to weighted frequent subgraph mining. In: International conference on on emerging technological trends [ICETT], IEEE, pp 1–7

  46. Gupta A, Thakur H, Gupta T, Yadav S (2017) Regular pattern mining (with jitter) on weighted-directed dynamic graphs. Journal of Engineering Science and Technology 12(2):349–364

    Google Scholar 

  47. Le NT, Vo B, Nguyen LB, Fujita H, Le B (2020) Mining weighted subgraphs in a single large graph. Information Sciences 514:149–165

    Article  MathSciNet  Google Scholar 

  48. Le NT, Vo B, Nguyen LB, Le B (2022) OWGraMi: Efficient method for mining weighted subgraphs in a single graph. Expert Syst Appl 117625

  49. Ashraf N, Haque RR, Islam M, Ahmed CF, Leung CK, Mai JJ, Wodi BH et al (2019) WeFreS: weighted frequent subgraph mining in a single large graph. In: Industrial conference on data mining. ibai publishing

  50. Islam MA, Ahmed CF, Leung CK, Hoi CS (2018) WFSM-MaxPWS: an efficient approach for mining weighted frequent subgraphs from edge-weighted graph databases. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, pp 664–676

  51. Zaki MJ, Meira W (2014) Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, New York, NY, USA

    Book  Google Scholar 

  52. Yan X Graph datasets. http://www.cs.ucsb.edu/~xyan/dataset.htm

  53. Mehmood D, Shafiq B, Vaidya J, Hong Y, Adam N, Atluri V (2012) Privacy-preserving subgraph discovery. In: IFIP Annual conference on data and applications security and privacy, Springer, pp 161–176

Download references

Acknowledgements

Out of the two frameworks discussed in this article, a preliminary version of the framework for static edge-weighted substructure mining has been previously published in PAKDD 2018 [50].

Funding

This work is partially supported by (a) University of Dhaka, (b) Natural Sciences and Engineering Research Council of Canada (NSERC), and (c) University of Manitoba.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Md. Ashraful Islam & Chowdhury Farhan Ahmed Methodology: Md. Ashraful Islam Formal analysis and investigation: Md. Ashraful Islam, Chowdhury Farhan Ahmed & Md. Tanvir Alam Writing - original draft preparation: Md. Ashraful Islam & Md. Tanvir Alam Writing - review and editing: Chowdhury Farhan Ahmed & Carson Kai-Sang Leung Supervision: Chowdhury Farhan Ahmed & Carson Kai-Sang Leung

Corresponding author

Correspondence to Md. Ashraful Islam.

Ethics declarations

Competing Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Ethical and informed consent for data used:

Used data are open-source and have no associated privacy and copyright issues.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Islam, M.A., Ahmed, C.F., Alam, M.T. et al. Graph-based substructure pattern mining with edge-weight. Appl Intell 54, 3756–3785 (2024). https://doi.org/10.1007/s10489-024-05356-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05356-7

Keywords

Navigation