research-article

Semi-supervised feature selection for graph classification

Authors:

Philip S. YuAuthors Info & Claims

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 793 - 802

https://doi.org/10.1145/1835804.1835905

Published: 25 July 2010 Publication History

Abstract

The problem of graph classification has attracted great interest in the last decade. Current research on graph classification assumes the existence of large amounts of labeled training graphs. However, in many applications, the labels of graph data are very expensive or difficult to obtain, while there are often copious amounts of unlabeled graph data available. In this paper, we study the problem of semi-supervised feature selection for graph classification and propose a novel solution, called gSSC, to efficiently search for optimal subgraph features with labeled and unlabeled graphs. Different from existing feature selection methods in vector spaces which assume the feature set is given, we perform semi-supervised feature selection for graph data in a progressive way together with the subgraph feature mining process. We derive a feature evaluation criterion, named gSemi, to estimate the usefulness of subgraph features based upon both labeled and unlabeled graphs. Then we propose a branch-and-bound algorithm to efficiently search for optimal subgraph features by judiciously pruning the subgraph search space. Empirical studies on several real-world tasks demonstrate that our semi-supervised feature selection approach can effectively boost graph classification performances with semi-supervised feature selection and is very efficient by pruning the subgraph search space using both labeled and unlabeled graphs.

Supplementary Material

JPG File (kdd2010_kong_ssfs_01.jpg)

Download
9.19 KB

MOV File (kdd2010_kong_ssfs_01.mov)

Download
136.55 MB

References

[1]

A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 6:937--965, 2005.

Digital Library

[2]

C. Borgelt and M. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In Proceedings of the 2nd International Conference on Data Mining, pages 211--218, Maebashi City, Japan, 2002.

Digital Library

[3]

C. Helma, R. King, S. Kramer, and A. Srinivasan. The predictive toxicology challenge 2000-2001. Bioinformatics, 17(1):107--108, 2001.

[4]

J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraph in the presence of isomorphism. In Proceedings of the 3rd International Conference on Data Mining, pages 549--552, Melbourne, FL, 2003.

Digital Library

[5]

A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 13--23, Lyon, France, 2000.

Digital Library

[6]

H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning, pages 321--328, Washington, DC, 2003.

Digital Library

[7]

T. Kudo, E. Maeda, and Y. Matsumoto. An application of boosting to graph classification. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 729--736. Cambridge, MA: MIT Press, 2005.

[8]

M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proceedings of the 1st International Conference on Data Mining, pages 313--320, San Jose, CA, 2001.

Digital Library

[9]

K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Academic Press, San Diego, CA, 1980.

[10]

S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 647--652, Seattle, WA, 2004.

Digital Library

[11]

J. Ren, Z. Qiu, W. Fan, H. Cheng, and P. S. Yu. Forword semi-supervised feature selection. In Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 970--976, Osaka, Japan, 2008.

Digital Library

[12]

W. Tang and S. Zhong. Pairwise constraints-guided dimensionality reduction. In SIAM International Conference on Data Mining Workshop on Feature Selection for Data Mining, Bethesda, MD, 2006.

[13]

M. Thoma, H. Cheng, A. Gretton, J. Han, H. Kriegel, A. Smola, L. Song, P. Yu, X. Yan, and K. Borgwardt. Near-optimal supervised feature selection among frequent subgraphs. In Proceedings of the SIAM International Conference on Data Mining, pages 1075--1086, Sparks, Nevada, 2009.

[14]

X. Yan, H. Cheng, J. Han, and P. Yu. Mining significant graph patterns by leap search. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 433--444, Vancouver, BC, 2008.

Digital Library

[15]

X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proceedings of the 2nd International Conference on Data Mining, pages 721--724, Maebashi City, Japan, 2002.

Digital Library

[16]

Z. Zhao and H. Liu. Semi-supervised feature selection via spectral analysis. In Proceedings of the SIAM International Conference on Data Mining, pages 641--646, Minneapolis, MN, 2007.

Cited By

Yang ZZhang GWu JYang JSheng QXue SZhou CAggarwal CPeng HHu WHancock ELiò P(2024)State of the Art and Potentialities of Graph-level LearningACM Computing Surveys10.1145/369586357:2(1-40)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3695863
Wang ZWang JShan YYu SXu XXuan QChen G(2024)Null Model-Based Data Augmentation for Graph ClassificationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.333249911:2(1821-1833)Online publication date: Mar-2024
https://doi.org/10.1109/TNSE.2023.3332499
Zhang JLei YWang YZhou CSheng V(2024)Hierarchical Graph Capsule Networks for Molecular Function Classification With Disentangled RepresentationsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.323335421:4(1072-1082)Online publication date: Jul-2024
https://doi.org/10.1109/TCBB.2022.3233354
Show More Cited By

Index Terms

Semi-supervised feature selection for graph classification
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Dual active feature and sample selection for graph classification
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Graph classification has become an important and active research topic in the last decade. Current research on graph classification focuses on mining discriminative subgraph features under supervised settings. The basic assumption is that a large number ...
Discriminative semi-supervised feature selection via manifold regularization

Feature selection has attracted a huge amount of interest in both research and application communities of data mining. We consider the problem of semi-supervised feature selection, where we are given a small amount of labeled examples and a large amount ...
Semi-supervised document classification using heterogeneous rule selection
ICEC '17: Proceedings of the International Conference on Electronic Commerce

In traditional supervised classification, a large set of labeled data is required to train the model. However, labeled data are often hard to obtain and expensive, because human efforts are needed for the labeling. Therefore, semi-supervised learning ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

July 2010

1240 pages

ISBN:9781450300551

DOI:10.1145/1835804

General Chairs:
Bharat Rao
Siemens
,
Balaji Krishnapuram
Siemens
,
Program Chairs:
Andrew Tomkins
Google Inc.
,
Qiang Yang
Hong Kong University of Science and Technology

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '10

Sponsor:

KDD '10: The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

July 25 - 28, 2010

DC, Washington, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

83
Total Citations
View Citations
1,336
Total Downloads

Downloads (Last 12 months)54
Downloads (Last 6 weeks)4

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang ZZhang GWu JYang JSheng QXue SZhou CAggarwal CPeng HHu WHancock ELiò P(2024)State of the Art and Potentialities of Graph-level LearningACM Computing Surveys10.1145/369586357:2(1-40)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3695863
Wang ZWang JShan YYu SXu XXuan QChen G(2024)Null Model-Based Data Augmentation for Graph ClassificationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.333249911:2(1821-1833)Online publication date: Mar-2024
https://doi.org/10.1109/TNSE.2023.3332499
Zhang JLei YWang YZhou CSheng V(2024)Hierarchical Graph Capsule Networks for Molecular Function Classification With Disentangled RepresentationsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.323335421:4(1072-1082)Online publication date: Jul-2024
https://doi.org/10.1109/TCBB.2022.3233354
Liu YZhou YHe CLiu YDong F(2024)Uncovering emerging technologies in intelligent manufacturing via graph classification of community characteristicsJournal of Engineering Design10.1080/09544828.2024.2411486(1-25)Online publication date: 3-Oct-2024
https://doi.org/10.1080/09544828.2024.2411486
Luo ZCui YZhao SYin J(2022)g-Inspector: Recurrent Attention Model on GraphIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.298368934:2(680-690)Online publication date: 1-Feb-2022
https://doi.org/10.1109/TKDE.2020.2983689
Chen XChen RWu QNie FYang MMao R(2022)Semisupervised Feature Selection via Structured Manifold LearningIEEE Transactions on Cybernetics10.1109/TCYB.2021.305284752:7(5756-5766)Online publication date: Jul-2022
https://doi.org/10.1109/TCYB.2021.3052847
Sangeetha Devi AShanmugapriya AKalaivani A(2022)Feature Selection and Classification using a Positive Learning Approach Focused on Graph and Neural Network2022 6th International Conference on Electronics, Communication and Aerospace Technology10.1109/ICECA55336.2022.10009427(01-07)Online publication date: 1-Dec-2022
https://doi.org/10.1109/ICECA55336.2022.10009427
Feng S(2022)Vertical federated learning-based feature selection with non-overlapping sample utilizationExpert Systems with Applications10.1016/j.eswa.2022.118097208(118097)Online publication date: Dec-2022
https://doi.org/10.1016/j.eswa.2022.118097
Zhang XWang HYu JChen CWang XZhang W(2022)Bipartite graph capsule networkWorld Wide Web10.1007/s11280-022-01009-226:1(421-440)Online publication date: 14-Feb-2022
https://doi.org/10.1007/s11280-022-01009-2
Huynh THo TLe B(2022)Graph Classification via Graph Structure LearningIntelligent Information and Database Systems10.1007/978-3-031-21967-2_22(269-281)Online publication date: 9-Dec-2022
https://doi.org/10.1007/978-3-031-21967-2_22
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten