FOGGER: an algorithm for graph generator discovery

Authors:
Zhiping Zeng

Tsinghua University, Beijing, P.R.China

Tsinghua University, Beijing, P.R.China
View Profile

,
Jianyong Wang

Tsinghua University, Beijing, P.R.China

Tsinghua University, Beijing, P.R.China
View Profile

,
Jun Zhang

Tsinghua University, Beijing, P.R.China

Tsinghua University, Beijing, P.R.China
View Profile

,
Lizhu Zhou

Tsinghua University, Beijing, P.R.China

Tsinghua University, Beijing, P.R.China
View Profile

EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database TechnologyMarch 2009Pages 517–528https://doi.org/10.1145/1516360.1516421

Published:24 March 2009Publication History

EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

Pages 517–528

ABSTRACT

To our best knowledge, all existing graph pattern mining algorithms can only mine either closed, maximal or the complete set of frequent subgraphs instead of graph generators which are preferable to the closed subgraphs according to the Minimum Description Length principle in some applications. In this paper, we study a new problem of frequent subgraph mining, called frequent connected graph generator mining, which poses significant challenges due to the underlying complexity associated with frequent subgraph mining as well as the absence of Apriori property for graph generators. Whereas, we still present an efficient solution FOGGER for this new problem. By exploring some properties of graph generators, two effective pruning techniques, backward edge pruning and forward edge pruning, are proposed to prune the branches of the well-known DFS code enumeration tree that do not contain graph generators. To further improve the efficiency, an effective index structure, ADI++, is also devised to facilitate the subgraph isomorphism checking. We experimentally evaluate various aspects of FOGGER using both real and synthetic datasets. Our results demonstrate that the two pruning techniques are effective in pruning the unpromising parts of search space, and FOGGER is efficient and scalable in terms of the base size of input databases. Meanwhile, the performance study for graph generator-based classification model shows that generator-based model is much simpler and can achieve almost the same accuracy for classifying chemical compounds in comparison with closed subgraph-based model.

References

Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In CL '00: Proceedings of the First International Conference on Computational Logic, pages 972--986, London, UK, 2000. Springer-Verlag. Google ScholarDigital Library
J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queris by means of free-sets. In PKDD '00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 75--85, London, UK, 2000. Springer-Verlag. Google ScholarDigital Library
J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 857--872, Beijing, China, 2007. ACM. Google ScholarDigital Library
B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1997. Translator-C. Franzke. Google ScholarDigital Library
Q. Gao, M. Li, and P. Vitányi. Applying mdl to learn best model granularity. Artificial Intelligence, 121(1--2):1--29, 2000. Google ScholarDigital Library
P. D. Grünwald, I. J. Myung, and M. A. Pitt. Advances in Minimum Description Length: Theory and Applications (Neural Information Processing). The MIT Press, 2005. Google ScholarDigital Library
P. Grünwald. A tutorial introduction to the minimum description length principle. The Computing Research Repository, math.ST/0406077, 2004.Google Scholar
H. Hu, X. Yan, Y. Huang, J. Han, and X. J. Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(1):213--221, 2005. Google ScholarDigital Library
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM '03: Proceedings of the Third IEEE International Conference on Data Mining, page 549, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarDigital Library
J. Huan, W. Wang, J. Prins, and J. Yang. Spin: mining maximal frequent subgraphs from graph databases. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 581--586, Seattle, WA, USA, 2004. ACM. Google ScholarDigital Library
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD '00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 13--23, London, UK, 2000. Springer-Verlag. Google ScholarDigital Library
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, pages 313--320, Washington, DC, USA, 2001. IEEE Computer Society. Google ScholarDigital Library
M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph*. Data Mining and Knowledge Discovery, 11(3):243--271, 2005. Google ScholarDigital Library
J. Li, H. Li, L. Wong, J. Pei, and G. Dong. Minimum description length principle: Generators are preferable to closed patterns. In AAAI '06, Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Application of Artificial Intelligence. AAAI Press, 2006. Google ScholarDigital Library
M. Li and P. Vitányi. An introduction to Kolmogorov complexity and its applications (2nd ed.). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1997. Google ScholarDigital Library
J. Rissanen. Modelling by the shortest data description. Automatica, 14:465--471, 1978.Google ScholarDigital Library
K. Sim, J. Li, V. Gopalkrishnan, and G. Liu. Mining maximal quasi-bicliques to co-cluster stocks and financial ratios for value investment. In ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 1059--1063, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
J. Sun, C. Faloutsos, S. Papadimitriou, and P. S. Yu. Graphscope: parameter-free mining of large time-evolving graphs. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 687--696, San Jose, California, USA, 2007. ACM. Google ScholarDigital Library
L. T. Thomas, S. R. Valluri, and K. Karlapalem. Margin: Maximal frequent subgraph mining. In ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 1097--1101, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
R. M. H. Ting and J. Bailey. Mining minimal contrast subgraph patterns. In SDM'06: Proceedings of the Sixth SIAM International Conference on Data Mining, Bethesda, MD, USA, 2006. SIAM.Google ScholarCross Ref
H. Tong, C. Faloutsos, and Y. Koren. Fast direction-aware proximity for graph mining. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 747--756, San Jose, California, USA, 2007. ACM. Google ScholarDigital Library
C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-based graph databases. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 581--586, Seattle, WA, USA, 2004. ACM. Google ScholarDigital Library
J. Wang, W. Hsu, M. L. Lee, and C. Sheng. A partition-based approach to graph mining. In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering, page 74, Atlanta, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
J. Wang, Z. Zeng, and L. Zhou. Clan:an algorithm for mining closed cliques from large dense graph databases. In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering, page 73, Atlanta, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
D. Williams, J. Huan, and W. Wang. Graph database indexing using structured graph decomposition. In ICDE '07, IEEE 23rd International Conference on Data Engineering, pages 976--985, Istanbul, Turkey, 2007.Google ScholarCross Ref
X. Yan and J. H. and. Closegraph: mining closed frequent graph patterns. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 286--295, Washington, D.C., 2003. ACM. Google ScholarDigital Library
X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM '02: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'02), page 721, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarDigital Library
X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 335--346, Paris, France, 2004. ACM. Google ScholarDigital Library
X. Yan, X. J. Zhou, and J. Han. Mining closed relational graphs with connectivity constraints. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 324--333, Chicago, Illinois, USA, 2005. ACM. Google ScholarDigital Library
M. J. Zaki. Efficiently mining frequent trees in a forest. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 71--80, Edmonton, Alberta, Canada, 2002. ACM. Google ScholarDigital Library
Z. Zeng, J. Wang, L. Zhou, and G. Karypis. Coherent closed quasi-clique discovery from large dense graph databases. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 797--802, Philadelphia, PA, USA, 2006. ACM. Google ScholarDigital Library
S. Zhang, J. Yang, and V. Cheedella. Monkey: Approximate graph mining based on spanning trees. In ICDE '07, IEEE 23rd International Conference on Data Engineering, pages 1247--1249, Istanbul, Turkey, 2007.Google ScholarCross Ref
P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: tree + delta <= graph. In VLDB '07: Proceedings of the 33rd international conference on Very large data bases, pages 938--949, Vienna, Austria, 2007. VLDB Endowment. Google ScholarDigital Library

Recommendations

Graph Classification via Graph Structure Learning
Intelligent Information and Database Systems
Abstract
With the ability of representing structures and complex relationships between data, graph learning is widely applied in many fields. The problem of graph classification is important in graph analysis and learning. There are many popular graph ...
Read More
A Truss-Based Framework for Graph Similarity Computation

The study of graph kernels has been an important area of graph analysis, which is widely used to solve the similarity problems between graphs. Most of the existing graph kernels consider either local or global properties of the graph, and there are ...
Read More
Graph partitioning and visualization in graph mining: a survey
Abstract
Graph mining is a process of obtaining one or more sub-graphs and has been a very attractive research topic over the last two decades. It has found many practical applications dealing with real world problems in variety of domains like Social ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
March 2009
1180 pages
ISBN:9781605584225
DOI:10.1145/1516360
Editors:
Martin Kersten
CWI, The Netherlands
,
Boris Novikov
University of Saint Petersburg, Russia
,
Jens Teubner
ETH Zurich, Switzerland
,
Vladimir Polutin
HP Labs, Russia
,
Stefan Manegold
CWI, The Netherlands
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 March 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph classification
graph generator
graph mining
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate7of10submissions,70%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 327
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

FOGGER: an algorithm for graph generator discovery

EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

ABSTRACT

References

Cited By

Recommendations

Graph Classification via Graph Structure Learning

A Truss-Based Framework for Graph Similarity Computation

Graph partitioning and visualization in graph mining: a survey