Graph mining for discovering infrastructure patterns in configuration management databases

Anchuri, Pranay; Zaki, Mohammed J.; Barkol, Omer; Bergman, Ruth; Felder, Yifat; Golan, Shahar; Sityon, Arik

doi:10.1007/s10115-012-0528-3

Graph mining for discovering infrastructure patterns in configuration management databases

Regular paper
Published: 10 August 2012

Volume 33, pages 491–522, (2012)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Pranay Anchuri¹,
Mohammed J. Zaki¹,
Omer Barkol²,
Ruth Bergman²,
Yifat Felder²,
Shahar Golan² &
…
Arik Sityon²

567 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

A configuration management database (CMDB) can be considered to be a large graph representing the IT infrastructure entities and their interrelationships. Mining such graphs is challenging because they are large, complex, and multi-attributed and have many repeated labels. These characteristics pose challenges for graph mining algorithms, due to the increased cost of subgraph isomorphism (for support counting) and graph isomorphism (for eliminating duplicate patterns). The notion of pattern frequency or support is also more challenging in a single graph, since it has to be defined in terms of the number of its (potentially, exponentially many) embeddings. We present CMDB-Miner, a novel two-step method for mining infrastructure patterns from CMDB graphs. It first samples the set of maximal frequent patterns and then clusters them to extract the representative infrastructure patterns. We demonstrate the effectiveness of CMDB-Miner on real-world CMDB graphs, as well as synthetic graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Clustering Algorithm for Large-Scale Graph Processing

A Highly Modular Architecture for Canned Pattern Selection Problem

A Dynamic Programming Framework for Large-Scale Online Clustering on Graphs

Article 09 August 2020

Yantao Li, Xiang Zhao & Zehui Qu

References

Al Hasan M, Zaki MJ (2009) Output space sampling for graph patterns. In: Proceedings of the 35th international conference on very large data bases, VLDB endowment, vol 2, no. 1, pp 730–741
Almeida H, Guedes D, Meira W Jr, Zaki MJ (2011) Is there a best quality metric for graph clusters? In: 15th European conference on principles and practice of knowledge discovery in databases
Besemann C, Denton A (2007) Mining edge-disjoint patterns in graph-relational data. In: Proceedings of the workshop on data mining for biomedical informatics at SDM-07, Citeseer, Minneapolis
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the seventh international conference on, world wide web 7, WWW7, pp 107–117
Bringmann B, Nijssen S (2008) What is frequent in a single graph? In: 12th Pacific-Asia conference on knowledge discovery and data mining
Bunke H, Shearer K (1998) A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 19(3–4):255–259
Article MATH Google Scholar
Calders T, Ramon J, Van Dyck D (2011) All normalized anti-monotonic overlap graph measures are bounded. Data Min Knowl Discov. doi:10.1007/s10618-011-0217-y (online first)
Chaoji V, Al Hasan M, Salem S, Besson J, Zaki MJ (2008) ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Stat Anal Data Min 1(2):67–84
Article MathSciNet Google Scholar
Chaoji V, Al Hasan M, Salem S, Zaki MJ (2008) An integrated, generic approach to pattern mining: data mining template library. Data Min Knowl Discov 17(3):457–495
Article MathSciNet Google Scholar
Chen C, Lin CX, Yan X, Han J (2008) On effective presentation of graph patterns: a structural representative approach. In: Proceeding of the 17th ACM conference on information and knowledge management, ACM, pp 299–308
Chen C, Yan X, Zhu F, Han J (2007) Gapprox: mining frequent approximate patterns from a massive network. In: Proceedings of the 2007 seventh IEEE international conference on data mining, ICDM ’07, pp 445–450
Chvtal V (1979) A greedy heuristic for the set-covering problem. Math Oper Res 4(3):233–235
Article MathSciNet Google Scholar
Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub) graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372
Article Google Scholar
Cvetkovic DM, Rowlinson P, Simic S, Biggs N (1997) Eigenspaces of graphs. Cambridge University Press, Cambridge
Book MATH Google Scholar
Dinitz Y (2006) Dinitzalgorithm: the original version and evens version. Theor Comput Sci :218–240
Fiedler M, Borgelt C (2007) Support computation for mining frequent subgraphs in a single graph. In: 5th international workshop on mining and learning with graphs
Hidovic D, Pelillo M (2004) Metrics for attributed graphs based on the maximal similarity common subgraph. Int J Pattern Recog Arti Intell 18(3):299–313
Article Google Scholar
Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: ICDM Proceedings, IEEE
Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: mining graph data. Mach Learn 50(3):321–354
Article MATH Google Scholar
Itai A, Perl Y, Shiloach Y (1982) The complexity of finding maximum disjoint paths with length constraints. Networks 12:277–286
Article MathSciNet MATH Google Scholar
Kannan R, Vempala S, Veta A (2000) On clusterings-good, bad and spectral. In: Proceedings of the 41st annual symposium on foundations of computer science, FOCS ’00, p 367
Kondor R, Vert J-P (2004) Diffusion kernels. In: Scholkopf B, Tsuda K, Vert J-P (eds) Kernel methods in computational biology. The MIT Press, Cambridge
Google Scholar
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: 1st IEEE international conference on data mining
Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Disc 11(3):243–271
Article MathSciNet Google Scholar
Li S, Zhang S, Yang J (2010) Dessin: mining dense subgraph patterns in a single graph. Sci Stat Database Manag 178–195
Li J, Liu Y, Gao H (2011) Summarizing graph patterns. IEEE Trans Knowl Data Eng. (99): 1. doi:10.1109/TKDE.2010.48 (online early access)
Melnik S, Garcia-Molina H, Rahm E (2002) Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th international conference on data engineering, ICDE ’02, p 117
Neuhaus M, Riesen K, Bunke H (2006) Fast suboptimal algorithms for the computation of graph edit distance. Struct Syntactic Stat Pattern Recogn 163–172
Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64
Article MathSciNet Google Scholar
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Van Dongen S (2004) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl 30(1):121–141
Google Scholar
Vanetik N, Shimony SE, Gudes E (2006) Support measures for graph data. Data Min Knowl Discov 13(2):243–260
Article MathSciNet MATH Google Scholar
Yan X, Han J (2002) Gspan: graph-based substructure pattern mining. In: IEEE international conference on data mining
Zhang S, Yang J, Li S (2009) Ring: an integrated method for frequent representative subgraph mining. In: 2009 ninth IEEE international conference on data mining, IEEE, pp 1082–1087
Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. Proc VLDB Endow 2(1):718–729
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Pranay Anchuri & Mohammed J. Zaki
HP Labs, 32000 , Technion City, Haifa, Israel
Omer Barkol, Ruth Bergman, Yifat Felder, Shahar Golan & Arik Sityon

Authors

Pranay Anchuri
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed J. Zaki
View author publications
You can also search for this author in PubMed Google Scholar
Omer Barkol
View author publications
You can also search for this author in PubMed Google Scholar
Ruth Bergman
View author publications
You can also search for this author in PubMed Google Scholar
Yifat Felder
View author publications
You can also search for this author in PubMed Google Scholar
Shahar Golan
View author publications
You can also search for this author in PubMed Google Scholar
Arik Sityon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed J. Zaki.

Additional information

This work was supported by the HP Labs Innovation Research Program Award, and in part by NSF Grant EMT-0829835.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anchuri, P., Zaki, M.J., Barkol, O. et al. Graph mining for discovering infrastructure patterns in configuration management databases. Knowl Inf Syst 33, 491–522 (2012). https://doi.org/10.1007/s10115-012-0528-3

Download citation

Received: 09 March 2012
Revised: 16 April 2012
Accepted: 14 July 2012
Published: 10 August 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10115-012-0528-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph mining for discovering infrastructure patterns in configuration management databases

Abstract

Access this article

Similar content being viewed by others

A Novel Clustering Algorithm for Large-Scale Graph Processing

A Highly Modular Architecture for Canned Pattern Selection Problem

A Dynamic Programming Framework for Large-Scale Online Clustering on Graphs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Graph mining for discovering infrastructure patterns in configuration management databases

Abstract

Access this article

Similar content being viewed by others

A Novel Clustering Algorithm for Large-Scale Graph Processing

A Highly Modular Architecture for Canned Pattern Selection Problem

A Dynamic Programming Framework for Large-Scale Online Clustering on Graphs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation