ABSTRACT
To our best knowledge, all existing graph pattern mining algorithms can only mine either closed, maximal or the complete set of frequent subgraphs instead of graph generators which are preferable to the closed subgraphs according to the Minimum Description Length principle in some applications. In this paper, we study a new problem of frequent subgraph mining, called frequent connected graph generator mining, which poses significant challenges due to the underlying complexity associated with frequent subgraph mining as well as the absence of Apriori property for graph generators. Whereas, we still present an efficient solution FOGGER for this new problem. By exploring some properties of graph generators, two effective pruning techniques, backward edge pruning and forward edge pruning, are proposed to prune the branches of the well-known DFS code enumeration tree that do not contain graph generators. To further improve the efficiency, an effective index structure, ADI++, is also devised to facilitate the subgraph isomorphism checking. We experimentally evaluate various aspects of FOGGER using both real and synthetic datasets. Our results demonstrate that the two pruning techniques are effective in pruning the unpromising parts of search space, and FOGGER is efficient and scalable in terms of the base size of input databases. Meanwhile, the performance study for graph generator-based classification model shows that generator-based model is much simpler and can achieve almost the same accuracy for classifying chemical compounds in comparison with closed subgraph-based model.
- Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In CL '00: Proceedings of the First International Conference on Computational Logic, pages 972--986, London, UK, 2000. Springer-Verlag. Google ScholarDigital Library
- J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queris by means of free-sets. In PKDD '00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 75--85, London, UK, 2000. Springer-Verlag. Google ScholarDigital Library
- J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 857--872, Beijing, China, 2007. ACM. Google ScholarDigital Library
- B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1997. Translator-C. Franzke. Google ScholarDigital Library
- Q. Gao, M. Li, and P. Vitányi. Applying mdl to learn best model granularity. Artificial Intelligence, 121(1--2):1--29, 2000. Google ScholarDigital Library
- P. D. Grünwald, I. J. Myung, and M. A. Pitt. Advances in Minimum Description Length: Theory and Applications (Neural Information Processing). The MIT Press, 2005. Google ScholarDigital Library
- P. Grünwald. A tutorial introduction to the minimum description length principle. The Computing Research Repository, math.ST/0406077, 2004.Google Scholar
- H. Hu, X. Yan, Y. Huang, J. Han, and X. J. Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(1):213--221, 2005. Google ScholarDigital Library
- J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM '03: Proceedings of the Third IEEE International Conference on Data Mining, page 549, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarDigital Library
- J. Huan, W. Wang, J. Prins, and J. Yang. Spin: mining maximal frequent subgraphs from graph databases. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 581--586, Seattle, WA, USA, 2004. ACM. Google ScholarDigital Library
- A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD '00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 13--23, London, UK, 2000. Springer-Verlag. Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, pages 313--320, Washington, DC, USA, 2001. IEEE Computer Society. Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph*. Data Mining and Knowledge Discovery, 11(3):243--271, 2005. Google ScholarDigital Library
- J. Li, H. Li, L. Wong, J. Pei, and G. Dong. Minimum description length principle: Generators are preferable to closed patterns. In AAAI '06, Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Application of Artificial Intelligence. AAAI Press, 2006. Google ScholarDigital Library
- M. Li and P. Vitányi. An introduction to Kolmogorov complexity and its applications (2nd ed.). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1997. Google ScholarDigital Library
- J. Rissanen. Modelling by the shortest data description. Automatica, 14:465--471, 1978.Google ScholarDigital Library
- K. Sim, J. Li, V. Gopalkrishnan, and G. Liu. Mining maximal quasi-bicliques to co-cluster stocks and financial ratios for value investment. In ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 1059--1063, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- J. Sun, C. Faloutsos, S. Papadimitriou, and P. S. Yu. Graphscope: parameter-free mining of large time-evolving graphs. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 687--696, San Jose, California, USA, 2007. ACM. Google ScholarDigital Library
- L. T. Thomas, S. R. Valluri, and K. Karlapalem. Margin: Maximal frequent subgraph mining. In ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 1097--1101, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- R. M. H. Ting and J. Bailey. Mining minimal contrast subgraph patterns. In SDM'06: Proceedings of the Sixth SIAM International Conference on Data Mining, Bethesda, MD, USA, 2006. SIAM.Google ScholarCross Ref
- H. Tong, C. Faloutsos, and Y. Koren. Fast direction-aware proximity for graph mining. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 747--756, San Jose, California, USA, 2007. ACM. Google ScholarDigital Library
- C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-based graph databases. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 581--586, Seattle, WA, USA, 2004. ACM. Google ScholarDigital Library
- J. Wang, W. Hsu, M. L. Lee, and C. Sheng. A partition-based approach to graph mining. In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering, page 74, Atlanta, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- J. Wang, Z. Zeng, and L. Zhou. Clan:an algorithm for mining closed cliques from large dense graph databases. In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering, page 73, Atlanta, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- D. Williams, J. Huan, and W. Wang. Graph database indexing using structured graph decomposition. In ICDE '07, IEEE 23rd International Conference on Data Engineering, pages 976--985, Istanbul, Turkey, 2007.Google ScholarCross Ref
- X. Yan and J. H. and. Closegraph: mining closed frequent graph patterns. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 286--295, Washington, D.C., 2003. ACM. Google ScholarDigital Library
- X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM '02: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'02), page 721, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarDigital Library
- X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 335--346, Paris, France, 2004. ACM. Google ScholarDigital Library
- X. Yan, X. J. Zhou, and J. Han. Mining closed relational graphs with connectivity constraints. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 324--333, Chicago, Illinois, USA, 2005. ACM. Google ScholarDigital Library
- M. J. Zaki. Efficiently mining frequent trees in a forest. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 71--80, Edmonton, Alberta, Canada, 2002. ACM. Google ScholarDigital Library
- Z. Zeng, J. Wang, L. Zhou, and G. Karypis. Coherent closed quasi-clique discovery from large dense graph databases. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 797--802, Philadelphia, PA, USA, 2006. ACM. Google ScholarDigital Library
- S. Zhang, J. Yang, and V. Cheedella. Monkey: Approximate graph mining based on spanning trees. In ICDE '07, IEEE 23rd International Conference on Data Engineering, pages 1247--1249, Istanbul, Turkey, 2007.Google ScholarCross Ref
- P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: tree + delta <= graph. In VLDB '07: Proceedings of the 33rd international conference on Very large data bases, pages 938--949, Vienna, Austria, 2007. VLDB Endowment. Google ScholarDigital Library
Recommendations
Graph Classification via Graph Structure Learning
Intelligent Information and Database SystemsAbstractWith the ability of representing structures and complex relationships between data, graph learning is widely applied in many fields. The problem of graph classification is important in graph analysis and learning. There are many popular graph ...
A Truss-Based Framework for Graph Similarity Computation
The study of graph kernels has been an important area of graph analysis, which is widely used to solve the similarity problems between graphs. Most of the existing graph kernels consider either local or global properties of the graph, and there are ...
Graph partitioning and visualization in graph mining: a survey
AbstractGraph mining is a process of obtaining one or more sub-graphs and has been a very attractive research topic over the last two decades. It has found many practical applications dealing with real world problems in variety of domains like Social ...
Comments