skip to main content
10.1145/1557019.1557089acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Large-scale graph mining using backbone refinement classes

Published:28 June 2009Publication History

ABSTRACT

We present a new approach to large-scale graph mining based on so-called backbone refinement classes. The method efficiently mines tree-shaped subgraph descriptors under minimum frequency and significance constraints, using classes of fragments to reduce feature set size and running times. The classes are defined in terms of fragments sharing a common backbone. The method is able to optimize structural inter-feature entropy as opposed to occurrences, which is characteristic for open or closed fragment mining. In the experiments, the proposed method reduces feature set sizes by >90 % and >30 % compared to complete tree mining and open tree mining, respectively. Evaluation using crossvalidation runs shows that their classification accuracy is similar to the complete set of trees but significantly better than that of open trees. Compared to open or closed fragment mining, a large part of the search space can be pruned due to an improved statistical constraint (dynamic upper bound adjustment), which is also confirmed in the experiments in lower running times compared to ordinary (static) upper bound pruning. Further analysis using large-scale datasets yields insight into important properties of the proposed descriptors, such as the dataset coverage and the class size represented by each descriptor. A final cross-validation run confirms that the novel descriptors render large training sets feasible which previously might have been intractable.

Skip Supplemental Material Section

Supplemental Material

p617-maunz.mp4

mp4

83.6 MB

References

  1. M. Al Hasan, V. Chaoji, S. Salem, J. Besson, and M. Zaki. Origami: Mining Representative Orthogonal Graph Patterns. ICDM 2007. Seventh IEEE International Conference on Data Mining, pages 153--162, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Bringmann, A. Zimmermann, L. de Raedt, and S. Nijssen. Don't Be Afraid of Simpler Patterns. In Proceedings 10th PKDD, pages 55--66. Springer-Verlag, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Helma. Lazy Structure-Activity Relationships (lazar) for the Prediction of Rodent Carcinogenicity and Salmonella Mutagenicity. Molecular Diversity, pages 147--158, 2006.Google ScholarGoogle Scholar
  4. T. Horvath, J. Ramon, and S. Wrobel. Frequent Subgraph Mining in Outerplanar Graphs. In KDD '06: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 197--206, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Jahn and S. Kramer. Optimizing gSpan for Molecular Datasets. In Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), 2005.Google ScholarGoogle Scholar
  6. S. Kramer, L. De Raedt, and C. Helma. Molecular feature mining in HIV data. In KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 136--143, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Morishita and J. Sese. Traversing Itemset Lattices with Statistical Metric Pruning. In Symposium on Principles of Database Systems, pages 226--236, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Nijssen and J. N. Kok. A Quickstart in Frequent Structure Mining can make a Difference. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 647--652, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Nijssen and J. N. Kok. Frequent Subgraph Miners: Runtimes Don't Say Everything. In Proceedings of the International Workshop on Mining and Learning with Graphs (MLG 2006, pages 173--180, 2006.Google ScholarGoogle Scholar
  10. U. Ruckert and S. Kramer. Optimizing Feature Sets for Structured Data. In Stan Matwin and Dunja Mladenic, editors, 18th ECML. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80--83, 1945.Google ScholarGoogle ScholarCross RefCross Ref
  12. M. Worlein, T. Meinl, I. Fischer, and M. Philippsen. A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston. In Proceedings of PKDD, pages 392--403, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  13. X. Yan and J. Han. gSpan: Graph-Based Substructure Pattern Mining. In ICDM '02: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'02), page 721, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. Yan and J. Han. CloseGraph: Mining Closed Frequent Graph Patterns. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 286--295, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Large-scale graph mining using backbone refinement classes

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
              June 2009
              1426 pages
              ISBN:9781605584959
              DOI:10.1145/1557019

              Copyright © 2009 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 28 June 2009

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate1,133of8,635submissions,13%

              Upcoming Conference

              KDD '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader