skip to main content
10.1145/1835804.1835905acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Semi-supervised feature selection for graph classification

Published: 25 July 2010 Publication History

Abstract

The problem of graph classification has attracted great interest in the last decade. Current research on graph classification assumes the existence of large amounts of labeled training graphs. However, in many applications, the labels of graph data are very expensive or difficult to obtain, while there are often copious amounts of unlabeled graph data available. In this paper, we study the problem of semi-supervised feature selection for graph classification and propose a novel solution, called gSSC, to efficiently search for optimal subgraph features with labeled and unlabeled graphs. Different from existing feature selection methods in vector spaces which assume the feature set is given, we perform semi-supervised feature selection for graph data in a progressive way together with the subgraph feature mining process. We derive a feature evaluation criterion, named gSemi, to estimate the usefulness of subgraph features based upon both labeled and unlabeled graphs. Then we propose a branch-and-bound algorithm to efficiently search for optimal subgraph features by judiciously pruning the subgraph search space. Empirical studies on several real-world tasks demonstrate that our semi-supervised feature selection approach can effectively boost graph classification performances with semi-supervised feature selection and is very efficient by pruning the subgraph search space using both labeled and unlabeled graphs.

Supplementary Material

JPG File (kdd2010_kong_ssfs_01.jpg)
MOV File (kdd2010_kong_ssfs_01.mov)

References

[1]
A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 6:937--965, 2005.
[2]
C. Borgelt and M. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In Proceedings of the 2nd International Conference on Data Mining, pages 211--218, Maebashi City, Japan, 2002.
[3]
C. Helma, R. King, S. Kramer, and A. Srinivasan. The predictive toxicology challenge 2000-2001. Bioinformatics, 17(1):107--108, 2001.
[4]
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraph in the presence of isomorphism. In Proceedings of the 3rd International Conference on Data Mining, pages 549--552, Melbourne, FL, 2003.
[5]
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 13--23, Lyon, France, 2000.
[6]
H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning, pages 321--328, Washington, DC, 2003.
[7]
T. Kudo, E. Maeda, and Y. Matsumoto. An application of boosting to graph classification. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 729--736. Cambridge, MA: MIT Press, 2005.
[8]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proceedings of the 1st International Conference on Data Mining, pages 313--320, San Jose, CA, 2001.
[9]
K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Academic Press, San Diego, CA, 1980.
[10]
S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 647--652, Seattle, WA, 2004.
[11]
J. Ren, Z. Qiu, W. Fan, H. Cheng, and P. S. Yu. Forword semi-supervised feature selection. In Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 970--976, Osaka, Japan, 2008.
[12]
W. Tang and S. Zhong. Pairwise constraints-guided dimensionality reduction. In SIAM International Conference on Data Mining Workshop on Feature Selection for Data Mining, Bethesda, MD, 2006.
[13]
M. Thoma, H. Cheng, A. Gretton, J. Han, H. Kriegel, A. Smola, L. Song, P. Yu, X. Yan, and K. Borgwardt. Near-optimal supervised feature selection among frequent subgraphs. In Proceedings of the SIAM International Conference on Data Mining, pages 1075--1086, Sparks, Nevada, 2009.
[14]
X. Yan, H. Cheng, J. Han, and P. Yu. Mining significant graph patterns by leap search. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 433--444, Vancouver, BC, 2008.
[15]
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proceedings of the 2nd International Conference on Data Mining, pages 721--724, Maebashi City, Japan, 2002.
[16]
Z. Zhao and H. Liu. Semi-supervised feature selection via spectral analysis. In Proceedings of the SIAM International Conference on Data Mining, pages 641--646, Minneapolis, MN, 2007.

Cited By

View all
  • (2024)State of the Art and Potentialities of Graph-level LearningACM Computing Surveys10.1145/369586357:2(1-40)Online publication date: 10-Oct-2024
  • (2024)Null Model-Based Data Augmentation for Graph ClassificationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.333249911:2(1821-1833)Online publication date: Mar-2024
  • (2024)Hierarchical Graph Capsule Networks for Molecular Function Classification With Disentangled RepresentationsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.323335421:4(1072-1082)Online publication date: Jul-2024
  • Show More Cited By

Index Terms

  1. Semi-supervised feature selection for graph classification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
    July 2010
    1240 pages
    ISBN:9781450300551
    DOI:10.1145/1835804
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data mining
    2. feature selection
    3. graph classification
    4. semi-supervised learning

    Qualifiers

    • Research-article

    Conference

    KDD '10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)54
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)State of the Art and Potentialities of Graph-level LearningACM Computing Surveys10.1145/369586357:2(1-40)Online publication date: 10-Oct-2024
    • (2024)Null Model-Based Data Augmentation for Graph ClassificationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.333249911:2(1821-1833)Online publication date: Mar-2024
    • (2024)Hierarchical Graph Capsule Networks for Molecular Function Classification With Disentangled RepresentationsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.323335421:4(1072-1082)Online publication date: Jul-2024
    • (2024)Uncovering emerging technologies in intelligent manufacturing via graph classification of community characteristicsJournal of Engineering Design10.1080/09544828.2024.2411486(1-25)Online publication date: 3-Oct-2024
    • (2022)g-Inspector: Recurrent Attention Model on GraphIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.298368934:2(680-690)Online publication date: 1-Feb-2022
    • (2022)Semisupervised Feature Selection via Structured Manifold LearningIEEE Transactions on Cybernetics10.1109/TCYB.2021.305284752:7(5756-5766)Online publication date: Jul-2022
    • (2022)Feature Selection and Classification using a Positive Learning Approach Focused on Graph and Neural Network2022 6th International Conference on Electronics, Communication and Aerospace Technology10.1109/ICECA55336.2022.10009427(01-07)Online publication date: 1-Dec-2022
    • (2022)Vertical federated learning-based feature selection with non-overlapping sample utilizationExpert Systems with Applications10.1016/j.eswa.2022.118097208(118097)Online publication date: Dec-2022
    • (2022)Bipartite graph capsule networkWorld Wide Web10.1007/s11280-022-01009-226:1(421-440)Online publication date: 14-Feb-2022
    • (2022)Graph Classification via Graph Structure LearningIntelligent Information and Database Systems10.1007/978-3-031-21967-2_22(269-281)Online publication date: 9-Dec-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media