research-article

Large-scale graph mining using backbone refinement classes

Authors:
Andreas Maunz

Albert-Ludwigs-Universität, Freiburg, Germany

Albert-Ludwigs-Universität, Freiburg, Germany
View Profile

,
Christoph Helma

in-silico Toxicology, Basel, Switzerland

in-silico Toxicology, Basel, Switzerland
View Profile

,
Stefan Kramer

Technische Universität, München, Germany

Technische Universität, München, Germany
View Profile

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningJune 2009Pages 617–626https://doi.org/10.1145/1557019.1557089

Published:28 June 2009Publication History

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 617–626

ABSTRACT

We present a new approach to large-scale graph mining based on so-called backbone refinement classes. The method efficiently mines tree-shaped subgraph descriptors under minimum frequency and significance constraints, using classes of fragments to reduce feature set size and running times. The classes are defined in terms of fragments sharing a common backbone. The method is able to optimize structural inter-feature entropy as opposed to occurrences, which is characteristic for open or closed fragment mining. In the experiments, the proposed method reduces feature set sizes by >90 % and >30 % compared to complete tree mining and open tree mining, respectively. Evaluation using crossvalidation runs shows that their classification accuracy is similar to the complete set of trees but significantly better than that of open trees. Compared to open or closed fragment mining, a large part of the search space can be pruned due to an improved statistical constraint (dynamic upper bound adjustment), which is also confirmed in the experiments in lower running times compared to ordinary (static) upper bound pruning. Further analysis using large-scale datasets yields insight into important properties of the proposed descriptors, such as the dataset coverage and the class size represented by each descriptor. A final cross-validation run confirms that the novel descriptors render large training sets feasible which previously might have been intractable.

Supplemental Material

p617-maunz.mp4

mp4

83.6 MB

Download

References

M. Al Hasan, V. Chaoji, S. Salem, J. Besson, and M. Zaki. Origami: Mining Representative Orthogonal Graph Patterns. ICDM 2007. Seventh IEEE International Conference on Data Mining, pages 153--162, Oct. 2007. Google ScholarDigital Library
B. Bringmann, A. Zimmermann, L. de Raedt, and S. Nijssen. Don't Be Afraid of Simpler Patterns. In Proceedings 10th PKDD, pages 55--66. Springer-Verlag, 2006. Google ScholarDigital Library
C. Helma. Lazy Structure-Activity Relationships (lazar) for the Prediction of Rodent Carcinogenicity and Salmonella Mutagenicity. Molecular Diversity, pages 147--158, 2006.Google Scholar
T. Horvath, J. Ramon, and S. Wrobel. Frequent Subgraph Mining in Outerplanar Graphs. In KDD '06: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 197--206, 2006. Google ScholarDigital Library
K. Jahn and S. Kramer. Optimizing gSpan for Molecular Datasets. In Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), 2005.Google Scholar
S. Kramer, L. De Raedt, and C. Helma. Molecular feature mining in HIV data. In KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 136--143, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
S. Morishita and J. Sese. Traversing Itemset Lattices with Statistical Metric Pruning. In Symposium on Principles of Database Systems, pages 226--236, 2000. Google ScholarDigital Library
S. Nijssen and J. N. Kok. A Quickstart in Frequent Structure Mining can make a Difference. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 647--652, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
S. Nijssen and J. N. Kok. Frequent Subgraph Miners: Runtimes Don't Say Everything. In Proceedings of the International Workshop on Mining and Learning with Graphs (MLG 2006, pages 173--180, 2006.Google Scholar
U. Ruckert and S. Kramer. Optimizing Feature Sets for Structured Data. In Stan Matwin and Dunja Mladenic, editors, 18th ECML. Springer, 2007. Google ScholarDigital Library
F. Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80--83, 1945.Google ScholarCross Ref
M. Worlein, T. Meinl, I. Fischer, and M. Philippsen. A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston. In Proceedings of PKDD, pages 392--403, 2005.Google ScholarCross Ref
X. Yan and J. Han. gSpan: Graph-Based Substructure Pattern Mining. In ICDM '02: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'02), page 721, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarDigital Library
X. Yan and J. Han. CloseGraph: Mining Closed Frequent Graph Patterns. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 286--295, New York, NY, USA, 2003. ACM. Google ScholarDigital Library

Index Terms

Large-scale graph mining using backbone refinement classes

Recommendations

Pushing Convertible Constraints in Frequent Itemset Mining

Recent work has highlighted the importance of the constraint-based mining paradigm in the context of frequent itemsets, associations, correlations, sequential patterns, and many other interesting patterns in large databases. Constraint pushing ...
Read More
A Model of Mining Noise-Tolerant Frequent Itemset in Transactional Databases
INCOS '15: Proceedings of the 2015 International Conference on Intelligent Networking and Collaborative Systems

Nowadays, mining approximate frequent itemsets from noisy data has attracted much attention in real applications. However, there is not widely accepted algorithm at present to solve the problem under noisy databases, which dues to two key issues. ...
Read More
Diversity Based Improved Bagging Algorithm
ICEMIS '15: Proceedings of the The International Conference on Engineering & MIS 2015

Bagging is a well known method for designing classifier ensembles. It builds an ensemble of classifier trained on different bootstrap replicates of the training data set. In this paper an improvement to bagging algorithm called DivBagging is presented ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
June 2009
1426 pages
ISBN:9781605584959
DOI:10.1145/1557019
General Chairs:
John Elder
Elder Research, Inc., USA
,
Françoise Soulié Fogelman
KXEN, France
,
Program Chairs:
Peter Flach
University of Bristol, UK
,
Mohammed Zaki
RPI, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification
graph-mining
pruning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 657
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Large-scale graph mining using backbone refinement classes

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Pushing Convertible Constraints in Frequent Itemset Mining

A Model of Mining Noise-Tolerant Frequent Itemset in Transactional Databases

Diversity Based Improved Bagging Algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Large-scale graph mining using backbone refinement classes

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Pushing Convertible Constraints in Frequent Itemset Mining

A Model of Mining Noise-Tolerant Frequent Itemset in Transactional Databases

Diversity Based Improved Bagging Algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media