Skip to main content
Log in

Graph mining for discovering infrastructure patterns in configuration management databases

  • Regular paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A configuration management database (CMDB) can be considered to be a large graph representing the IT infrastructure entities and their interrelationships. Mining such graphs is challenging because they are large, complex, and multi-attributed and have many repeated labels. These characteristics pose challenges for graph mining algorithms, due to the increased cost of subgraph isomorphism (for support counting) and graph isomorphism (for eliminating duplicate patterns). The notion of pattern frequency or support is also more challenging in a single graph, since it has to be defined in terms of the number of its (potentially, exponentially many) embeddings. We present CMDB-Miner, a novel two-step method for mining infrastructure patterns from CMDB graphs. It first samples the set of maximal frequent patterns and then clusters them to extract the representative infrastructure patterns. We demonstrate the effectiveness of CMDB-Miner on real-world CMDB graphs, as well as synthetic graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Al Hasan M, Zaki MJ (2009) Output space sampling for graph patterns. In: Proceedings of the 35th international conference on very large data bases, VLDB endowment, vol 2, no. 1, pp 730–741

  2. Almeida H, Guedes D, Meira W Jr, Zaki MJ (2011) Is there a best quality metric for graph clusters? In: 15th European conference on principles and practice of knowledge discovery in databases

  3. Besemann C, Denton A (2007) Mining edge-disjoint patterns in graph-relational data. In: Proceedings of the workshop on data mining for biomedical informatics at SDM-07, Citeseer, Minneapolis

  4. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the seventh international conference on, world wide web 7, WWW7, pp 107–117

  5. Bringmann B, Nijssen S (2008) What is frequent in a single graph? In: 12th Pacific-Asia conference on knowledge discovery and data mining

  6. Bunke H, Shearer K (1998) A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 19(3–4):255–259

    Article  MATH  Google Scholar 

  7. Calders T, Ramon J, Van Dyck D (2011) All normalized anti-monotonic overlap graph measures are bounded. Data Min Knowl Discov. doi:10.1007/s10618-011-0217-y (online first)

  8. Chaoji V, Al Hasan M, Salem S, Besson J, Zaki MJ (2008) ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Stat Anal Data Min 1(2):67–84

    Article  MathSciNet  Google Scholar 

  9. Chaoji V, Al Hasan M, Salem S, Zaki MJ (2008) An integrated, generic approach to pattern mining: data mining template library. Data Min Knowl Discov 17(3):457–495

    Article  MathSciNet  Google Scholar 

  10. Chen C, Lin CX, Yan X, Han J (2008) On effective presentation of graph patterns: a structural representative approach. In: Proceeding of the 17th ACM conference on information and knowledge management, ACM, pp 299–308

  11. Chen C, Yan X, Zhu F, Han J (2007) Gapprox: mining frequent approximate patterns from a massive network. In: Proceedings of the 2007 seventh IEEE international conference on data mining, ICDM ’07, pp 445–450

  12. Chvtal V (1979) A greedy heuristic for the set-covering problem. Math Oper Res 4(3):233–235

    Article  MathSciNet  Google Scholar 

  13. Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub) graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372

    Article  Google Scholar 

  14. Cvetkovic DM, Rowlinson P, Simic S, Biggs N (1997) Eigenspaces of graphs. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  15. Dinitz Y (2006) Dinitzalgorithm: the original version and evens version. Theor Comput Sci :218–240

  16. Fiedler M, Borgelt C (2007) Support computation for mining frequent subgraphs in a single graph. In: 5th international workshop on mining and learning with graphs

  17. Hidovic D, Pelillo M (2004) Metrics for attributed graphs based on the maximal similarity common subgraph. Int J Pattern Recog Arti Intell 18(3):299–313

    Article  Google Scholar 

  18. Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: ICDM Proceedings, IEEE

  19. Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: mining graph data. Mach Learn 50(3):321–354

    Article  MATH  Google Scholar 

  20. Itai A, Perl Y, Shiloach Y (1982) The complexity of finding maximum disjoint paths with length constraints. Networks 12:277–286

    Article  MathSciNet  MATH  Google Scholar 

  21. Kannan R, Vempala S, Veta A (2000) On clusterings-good, bad and spectral. In: Proceedings of the 41st annual symposium on foundations of computer science, FOCS ’00, p 367

  22. Kondor R, Vert J-P (2004) Diffusion kernels. In: Scholkopf B, Tsuda K, Vert J-P (eds) Kernel methods in computational biology. The MIT Press, Cambridge

    Google Scholar 

  23. Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: 1st IEEE international conference on data mining

  24. Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Disc 11(3):243–271

    Article  MathSciNet  Google Scholar 

  25. Li S, Zhang S, Yang J (2010) Dessin: mining dense subgraph patterns in a single graph. Sci Stat Database Manag 178–195

  26. Li J, Liu Y, Gao H (2011) Summarizing graph patterns. IEEE Trans Knowl Data Eng. (99): 1. doi:10.1109/TKDE.2010.48 (online early access)

  27. Melnik S, Garcia-Molina H, Rahm E (2002) Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th international conference on data engineering, ICDE ’02, p 117

  28. Neuhaus M, Riesen K, Bunke H (2006) Fast suboptimal algorithms for the computation of graph edit distance. Struct Syntactic Stat Pattern Recogn 163–172

  29. Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64

    Article  MathSciNet  Google Scholar 

  30. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  31. Van Dongen S (2004) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30(1):121–141

    Google Scholar 

  32. Vanetik N, Shimony SE, Gudes E (2006) Support measures for graph data. Data Min Knowl Discov 13(2):243–260

    Article  MathSciNet  MATH  Google Scholar 

  33. Yan X, Han J (2002) Gspan: graph-based substructure pattern mining. In: IEEE international conference on data mining

  34. Zhang S, Yang J, Li S (2009) Ring: an integrated method for frequent representative subgraph mining. In: 2009 ninth IEEE international conference on data mining, IEEE, pp 1082–1087

  35. Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. Proc VLDB Endow 2(1):718–729

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed J. Zaki.

Additional information

This work was supported by the HP Labs Innovation Research Program Award, and in part by NSF Grant EMT-0829835.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anchuri, P., Zaki, M.J., Barkol, O. et al. Graph mining for discovering infrastructure patterns in configuration management databases. Knowl Inf Syst 33, 491–522 (2012). https://doi.org/10.1007/s10115-012-0528-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0528-3

Keywords

Navigation