Skip to main content
Log in

MH-DAGMiner: maximal hierarchical sub-DAG mining in directed weighted networks

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

With the easy availability of network data, there is an increasing need for extracting the compact and meaningful directed acyclic graph (DAG) patterns from network databases. There exists no specific method of sub-DAG pattern mining from the network databases with cyclic graphs. This paper presents a novel method, MH-DAGMiner, for mining maximal hierarchical sub-DAG patterns from a directed and weighted network database. To avoid the exhaustive steps of candidate generation, frequency counting and non-maximal patterns pruning, we propose a novel and effective FP-DAG structure based on edge grouping. We propose the optimum branching method to identify the vertex hierarchy within each maximal edge group generated from FP-DAG, and discover the maximal hierarchical sub-DAG patterns. The effectiveness of MH-DAGMiner is tested using several real-world network datasets and synthetic graph datasets. We analyzed the quality of MH-DAGMiner in comparison with a Naive-DAGMiner method which mines DAG patterns from a preprocessed cyclic graph (DAG) database. MH-DAGMiner is found to be more effective than Naive-DAGMiner with lower average information loss. MH-DAGMiner is also found to be more effective than the state-of-the-art benchmarking algorithms such as gSpan, FP-GraphMiner, and MFSH-TreeMiner where maximal hierarchical DAG patterns are identified with a post-processing step.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. In tree terminology, this is same as maximum spanning forest.

  2. In tree terminology, this is same as maximum spanning tree.

  3. Implementation is available via http://edmonds-alg.sourceforge.net/.

  4. http://snap.stanford.edu/data/higgs-twitter.html,

  5. http://snap.stanford.edu/data/sx-mathoverflow.html.

  6. https://www.cs.cmu.edu/~./enron/.

  7. https://static.aminer.org/lab-datasets/soinf/citation-raw.txt.

  8. https://snap.stanford.edu/data/wiki-talk-temporal.html.

  9. https://snap.stanford.edu/data/CollegeMsg.html.

References

  1. Aggarwal CC, Han J (eds) (2014) Frequent pattern mining. Springer, Berlin, pp 1–17

    MATH  Google Scholar 

  2. Bonchi F (2011) Influence propagation in social networks: a data mining perspective. IEEE Intell Inform Bull 12(1):8–105

    Google Scholar 

  3. Chen YL, Kao HP, Ko MT (2004) Mining DAG patterns from DAG databases. In: International conference on web-age information management, pp 579–588

  4. Cooley R, Mobasher B, Srivastava J (1997) Web mining: information and pattern discovery on the world wide web. In: IEEE international conference on tools with artificial intelligence, pp 558–567

  5. Edmonds J (1968) Optimum branchings. Math Decis Sci 1:335–345

    MATH  Google Scholar 

  6. Fariha A, Ahmed CF, Leung CK et al (2015) A new framework for mining frequent interaction patterns from meeting databases. Eng Appl Artif Intell 45:103–118

    Article  Google Scholar 

  7. Gupte M, Shankar P, Li J et al (2011) Finding hierarchy in directed online social networks. In: International conference on world wide web WWW ’11, pp 557–566

  8. Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: IEEE international conference on data mining (ICDM), pp 549–552

  9. Huan J, Wang W, Prins J et al (2004) SPIN: mining maximal frequent subgraphs from graph databases. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 581–586

  10. Hwang S, Wei C, Yang W (2004) Discovery of temporal patterns from process instances. Comput Ind 53(3):345–364

    Article  Google Scholar 

  11. Inokuchi A, Washio T, Motoda H (2000) An a priori-based algorithm for mining frequent substructures from graph data. In: European conference on principles of data mining and knowledge discovery, pp 13–23

  12. Jiadong R, HuiFang W, Yue M et al (2015) Efficient software fault localization by hierarchical instrumentation and maximal frequent subgraph mining. Int J Innov Comput Inf Control 11(6):1897–1911

    Google Scholar 

  13. Jiang C, Coenen F, Zito M (2013) A survey of frequent subgraph mining algorithms. Knowl Eng Rev 28(1):75–105

    Article  Google Scholar 

  14. Johnsonbaugh R, Kalin M (1991) A graph generation software package. ACM SIGCSE Bull 23(1):151–154

    Article  Google Scholar 

  15. Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: ACM SIGKDD international conference on Knowledge discovery and data mining, pp 137–146

  16. Leung CW (2010) Technical notes on extending gSpan to directed graphs. Technical Report, Management University, Singapore

  17. Li Y, Lin Q, Zhong G, Duan D et al (2009) A directed labeled graph frequent pattern mining algorithm based on minimum code. In: International conference on multimedia and ubiquitous engineering, pp 353–359

  18. Nijssen S, Kok J N (2004) A quickstart in frequent structure mining can make a difference. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 647–652

  19. Sreenivasa GJ, Ananthanarayana VS (2006) Efficient mining of frequent rooted continuous directed subgraphs. In: International conference on advanced computing and communications (ADCOM), pp 553–558

  20. Tarjan RE (1977) Finding optimum branchings. Networks 7(1):25–3

    Article  MathSciNet  MATH  Google Scholar 

  21. Termier A, Tamada Y, Numata K et al (2007) DIGDAG, a first algorithm to mine closed frequent embedded sub-DAGs. In: Mining and learning with graphs workshop (MLG’07), pp 41–45

  22. Thomas LT, Valluri SR, Karlapalem K (2010) MARGIN: maximal frequent subgraph mining. ACM Trans Knowl Discov Data (TKDD) 4(3):10:1–10:42

    Google Scholar 

  23. Van der Aalst W, Weijters T, Maruster L (2004) Workflow mining: discovering process models from event logs. IEEE Trans Knowl Data Eng 16(9):1128–1142

    Article  Google Scholar 

  24. Vijayalakshmi R, Rethnasamy N, Roddick JF et al (2001) FP-GraphMiner: a fast frequent pattern mining algorithm for network graphs. J Graph Algorithms Appl 15(6):753–776

    Article  MathSciNet  MATH  Google Scholar 

  25. Werth T, Dreweke A, Wörlein M et al (2008) DAGMA: mining directed acyclic graphs. In: IADIS European conference on data mining, pp 11–18

  26. Yan X, Han J (2002) Gspan: graph-based substructure pattern mining. In: IEEE international conference on data mining (ICDM ’02), pp 721–724

  27. Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 286–295

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T. M. G. Tennakoon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tennakoon, T.M.G., Nayak, R. MH-DAGMiner: maximal hierarchical sub-DAG mining in directed weighted networks. Knowl Inf Syst 61, 431–462 (2019). https://doi.org/10.1007/s10115-018-1300-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1300-0

Keywords

Navigation