Abstract
A configuration management database (CMDB) can be considered to be a large graph representing the IT infrastructure entities and their interrelationships. Mining such graphs is challenging because they are large, complex, and multi-attributed and have many repeated labels. These characteristics pose challenges for graph mining algorithms, due to the increased cost of subgraph isomorphism (for support counting) and graph isomorphism (for eliminating duplicate patterns). The notion of pattern frequency or support is also more challenging in a single graph, since it has to be defined in terms of the number of its (potentially, exponentially many) embeddings. We present CMDB-Miner, a novel two-step method for mining infrastructure patterns from CMDB graphs. It first samples the set of maximal frequent patterns and then clusters them to extract the representative infrastructure patterns. We demonstrate the effectiveness of CMDB-Miner on real-world CMDB graphs, as well as synthetic graphs.
Similar content being viewed by others
References
Al Hasan M, Zaki MJ (2009) Output space sampling for graph patterns. In: Proceedings of the 35th international conference on very large data bases, VLDB endowment, vol 2, no. 1, pp 730–741
Almeida H, Guedes D, Meira W Jr, Zaki MJ (2011) Is there a best quality metric for graph clusters? In: 15th European conference on principles and practice of knowledge discovery in databases
Besemann C, Denton A (2007) Mining edge-disjoint patterns in graph-relational data. In: Proceedings of the workshop on data mining for biomedical informatics at SDM-07, Citeseer, Minneapolis
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the seventh international conference on, world wide web 7, WWW7, pp 107–117
Bringmann B, Nijssen S (2008) What is frequent in a single graph? In: 12th Pacific-Asia conference on knowledge discovery and data mining
Bunke H, Shearer K (1998) A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 19(3–4):255–259
Calders T, Ramon J, Van Dyck D (2011) All normalized anti-monotonic overlap graph measures are bounded. Data Min Knowl Discov. doi:10.1007/s10618-011-0217-y (online first)
Chaoji V, Al Hasan M, Salem S, Besson J, Zaki MJ (2008) ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Stat Anal Data Min 1(2):67–84
Chaoji V, Al Hasan M, Salem S, Zaki MJ (2008) An integrated, generic approach to pattern mining: data mining template library. Data Min Knowl Discov 17(3):457–495
Chen C, Lin CX, Yan X, Han J (2008) On effective presentation of graph patterns: a structural representative approach. In: Proceeding of the 17th ACM conference on information and knowledge management, ACM, pp 299–308
Chen C, Yan X, Zhu F, Han J (2007) Gapprox: mining frequent approximate patterns from a massive network. In: Proceedings of the 2007 seventh IEEE international conference on data mining, ICDM ’07, pp 445–450
Chvtal V (1979) A greedy heuristic for the set-covering problem. Math Oper Res 4(3):233–235
Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub) graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372
Cvetkovic DM, Rowlinson P, Simic S, Biggs N (1997) Eigenspaces of graphs. Cambridge University Press, Cambridge
Dinitz Y (2006) Dinitzalgorithm: the original version and evens version. Theor Comput Sci :218–240
Fiedler M, Borgelt C (2007) Support computation for mining frequent subgraphs in a single graph. In: 5th international workshop on mining and learning with graphs
Hidovic D, Pelillo M (2004) Metrics for attributed graphs based on the maximal similarity common subgraph. Int J Pattern Recog Arti Intell 18(3):299–313
Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: ICDM Proceedings, IEEE
Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: mining graph data. Mach Learn 50(3):321–354
Itai A, Perl Y, Shiloach Y (1982) The complexity of finding maximum disjoint paths with length constraints. Networks 12:277–286
Kannan R, Vempala S, Veta A (2000) On clusterings-good, bad and spectral. In: Proceedings of the 41st annual symposium on foundations of computer science, FOCS ’00, p 367
Kondor R, Vert J-P (2004) Diffusion kernels. In: Scholkopf B, Tsuda K, Vert J-P (eds) Kernel methods in computational biology. The MIT Press, Cambridge
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: 1st IEEE international conference on data mining
Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Disc 11(3):243–271
Li S, Zhang S, Yang J (2010) Dessin: mining dense subgraph patterns in a single graph. Sci Stat Database Manag 178–195
Li J, Liu Y, Gao H (2011) Summarizing graph patterns. IEEE Trans Knowl Data Eng. (99): 1. doi:10.1109/TKDE.2010.48 (online early access)
Melnik S, Garcia-Molina H, Rahm E (2002) Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th international conference on data engineering, ICDE ’02, p 117
Neuhaus M, Riesen K, Bunke H (2006) Fast suboptimal algorithms for the computation of graph edit distance. Struct Syntactic Stat Pattern Recogn 163–172
Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Van Dongen S (2004) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30(1):121–141
Vanetik N, Shimony SE, Gudes E (2006) Support measures for graph data. Data Min Knowl Discov 13(2):243–260
Yan X, Han J (2002) Gspan: graph-based substructure pattern mining. In: IEEE international conference on data mining
Zhang S, Yang J, Li S (2009) Ring: an integrated method for frequent representative subgraph mining. In: 2009 ninth IEEE international conference on data mining, IEEE, pp 1082–1087
Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. Proc VLDB Endow 2(1):718–729
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the HP Labs Innovation Research Program Award, and in part by NSF Grant EMT-0829835.
Rights and permissions
About this article
Cite this article
Anchuri, P., Zaki, M.J., Barkol, O. et al. Graph mining for discovering infrastructure patterns in configuration management databases. Knowl Inf Syst 33, 491–522 (2012). https://doi.org/10.1007/s10115-012-0528-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0528-3