skip to main content
10.1145/1516360.1516421acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free Access

FOGGER: an algorithm for graph generator discovery

Published:24 March 2009Publication History

ABSTRACT

To our best knowledge, all existing graph pattern mining algorithms can only mine either closed, maximal or the complete set of frequent subgraphs instead of graph generators which are preferable to the closed subgraphs according to the Minimum Description Length principle in some applications. In this paper, we study a new problem of frequent subgraph mining, called frequent connected graph generator mining, which poses significant challenges due to the underlying complexity associated with frequent subgraph mining as well as the absence of Apriori property for graph generators. Whereas, we still present an efficient solution FOGGER for this new problem. By exploring some properties of graph generators, two effective pruning techniques, backward edge pruning and forward edge pruning, are proposed to prune the branches of the well-known DFS code enumeration tree that do not contain graph generators. To further improve the efficiency, an effective index structure, ADI++, is also devised to facilitate the subgraph isomorphism checking. We experimentally evaluate various aspects of FOGGER using both real and synthetic datasets. Our results demonstrate that the two pruning techniques are effective in pruning the unpromising parts of search space, and FOGGER is efficient and scalable in terms of the base size of input databases. Meanwhile, the performance study for graph generator-based classification model shows that generator-based model is much simpler and can achieve almost the same accuracy for classifying chemical compounds in comparison with closed subgraph-based model.

References

  1. Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In CL '00: Proceedings of the First International Conference on Computational Logic, pages 972--986, London, UK, 2000. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queris by means of free-sets. In PKDD '00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 75--85, London, UK, 2000. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 857--872, Beijing, China, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1997. Translator-C. Franzke. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Q. Gao, M. Li, and P. Vitányi. Applying mdl to learn best model granularity. Artificial Intelligence, 121(1--2):1--29, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. D. Grünwald, I. J. Myung, and M. A. Pitt. Advances in Minimum Description Length: Theory and Applications (Neural Information Processing). The MIT Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Grünwald. A tutorial introduction to the minimum description length principle. The Computing Research Repository, math.ST/0406077, 2004.Google ScholarGoogle Scholar
  8. H. Hu, X. Yan, Y. Huang, J. Han, and X. J. Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(1):213--221, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM '03: Proceedings of the Third IEEE International Conference on Data Mining, page 549, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Huan, W. Wang, J. Prins, and J. Yang. Spin: mining maximal frequent subgraphs from graph databases. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 581--586, Seattle, WA, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD '00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 13--23, London, UK, 2000. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, pages 313--320, Washington, DC, USA, 2001. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph*. Data Mining and Knowledge Discovery, 11(3):243--271, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Li, H. Li, L. Wong, J. Pei, and G. Dong. Minimum description length principle: Generators are preferable to closed patterns. In AAAI '06, Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Application of Artificial Intelligence. AAAI Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Li and P. Vitányi. An introduction to Kolmogorov complexity and its applications (2nd ed.). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Rissanen. Modelling by the shortest data description. Automatica, 14:465--471, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Sim, J. Li, V. Gopalkrishnan, and G. Liu. Mining maximal quasi-bicliques to co-cluster stocks and financial ratios for value investment. In ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 1059--1063, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Sun, C. Faloutsos, S. Papadimitriou, and P. S. Yu. Graphscope: parameter-free mining of large time-evolving graphs. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 687--696, San Jose, California, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. T. Thomas, S. R. Valluri, and K. Karlapalem. Margin: Maximal frequent subgraph mining. In ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 1097--1101, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. M. H. Ting and J. Bailey. Mining minimal contrast subgraph patterns. In SDM'06: Proceedings of the Sixth SIAM International Conference on Data Mining, Bethesda, MD, USA, 2006. SIAM.Google ScholarGoogle ScholarCross RefCross Ref
  21. H. Tong, C. Faloutsos, and Y. Koren. Fast direction-aware proximity for graph mining. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 747--756, San Jose, California, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-based graph databases. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 581--586, Seattle, WA, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Wang, W. Hsu, M. L. Lee, and C. Sheng. A partition-based approach to graph mining. In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering, page 74, Atlanta, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Wang, Z. Zeng, and L. Zhou. Clan:an algorithm for mining closed cliques from large dense graph databases. In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering, page 73, Atlanta, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Williams, J. Huan, and W. Wang. Graph database indexing using structured graph decomposition. In ICDE '07, IEEE 23rd International Conference on Data Engineering, pages 976--985, Istanbul, Turkey, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  26. X. Yan and J. H. and. Closegraph: mining closed frequent graph patterns. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 286--295, Washington, D.C., 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM '02: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'02), page 721, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 335--346, Paris, France, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. X. Yan, X. J. Zhou, and J. Han. Mining closed relational graphs with connectivity constraints. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 324--333, Chicago, Illinois, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. J. Zaki. Efficiently mining frequent trees in a forest. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 71--80, Edmonton, Alberta, Canada, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Z. Zeng, J. Wang, L. Zhou, and G. Karypis. Coherent closed quasi-clique discovery from large dense graph databases. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 797--802, Philadelphia, PA, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Zhang, J. Yang, and V. Cheedella. Monkey: Approximate graph mining based on spanning trees. In ICDE '07, IEEE 23rd International Conference on Data Engineering, pages 1247--1249, Istanbul, Turkey, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  33. P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: tree + delta <= graph. In VLDB '07: Proceedings of the 33rd international conference on Very large data bases, pages 938--949, Vienna, Austria, 2007. VLDB Endowment. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
    March 2009
    1180 pages
    ISBN:9781605584225
    DOI:10.1145/1516360

    Copyright © 2009 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 24 March 2009

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate7of10submissions,70%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader