Graph Classification

Tsuda, Koji; Saigo, Hiroto

doi:10.1007/978-1-4419-6045-0_11

Koji Tsuda³ &
Hiroto Saigo⁴

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

7496 Accesses
11 Citations

Abstract

Supervised learning on graphs is a central subject in graph data processing. In graph classification and regression, we assume that the target values of a certain number of graphs or a certain part of a graph are available as a training dataset, and our goal is to derive the target values of other graphs or the remaining part of the graph. In drug discovery applications, for example, a graph and its target value correspond to a chemical compound and its chemical activity. In this chapter, we review state-of-the-art methods of graph classification. In particular, we focus on two representative methods, graph kernels and graph boosting, and we present other methods in relation to the two methods. We describe the strengths and weaknesses of different graph classification methods and recent efforts to overcome the challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. VLDB 1994, pages 487–499, 1994.
Google Scholar
T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In Proc 2nd SIAM Data Mining Conference (SDM), pages 158–174, 2002.
Google Scholar
R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994.
Google Scholar
M. Boley and H. Grosskreutz. A randomized approach for approximating the number of frequent sets. In Proceedings of the 8th IEEE International Conference on Data Mining, pages 43–52, 2008.
Google Scholar
K. M. Borgwardt, C. S. Ong, S. Schonauer, S. V. N. Vishwanathan, A. J. Smola, and H.-P. Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21(suppl. 1):i47–i56, 2006.
Google Scholar
O. Chapelle, A. Zien, and B. Scholkopf, editors. Semi-Supervised Learning. MIT Press, Cambridge, MA, 2006.
Google Scholar
T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. MIT Press and McGraw Hill, 1990.
Google Scholar
A. Demiriz, K.P. Bennet, and J. Shawe-Taylor. Linear programming boosting via column generation. Machine Learning, 46(1–3):225–254, 2002.
Article MATH Google Scholar
M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng., 17(8):1036–1050, 2005.
Article Google Scholar
O. du Merle, D. Villeneuve, J. Desrosiers, and P. Hansen. Stabilized column generation. Discrete Mathematics, 194:229–237, 1999.
Article MATH MathSciNet Google Scholar
F. Eichinger, K. Bohm, and M. Huber. Mining edge-weighted call graphs to localise software bugs. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pages 333–348, 2008.
Google Scholar
T. Gartner, P. Flach, and S. Wrobel. On graph kernels: Hardness results and efficient alternatives. In Proc. of the Sixteenth Annual Conference on Computational Learning Theory, 2003.
Google Scholar
I. Guyon, J. Weston, S. Bahnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3):389–422, 2002.
Article MATH Google Scholar
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.
Google Scholar
Z. Harchaoui and F. Bach. Image classification with segmentation graph kernels. In 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2007.
Google Scholar
C. Helma, T. Cramer, S. Kramer, and L.D. Raedt. Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J. Chem. Inf. Comput. Sci., 44:1402–1411, 2004.
Google Scholar
T. Horvath, T. Gartner, and S. Wrobel. Cyclic pattern kernels for predictive graph mining. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 158–167, 2004.
Google Scholar
A. Inokuchi. Mining generalized substructures from a set of labeled graphs. In Proceedings of the 4th IEEE Internatinal Conference on Data Mining, pages 415–418. IEEE Computer Society, 2005.
Google Scholar
Y. Karklin, R.F. Meraz, and S.R. Holbrook. Classification of non-coding rna using graph representations of secondary structure. In Pacific Symposium on Biocomputing, pages 4–15, 2005.
Google Scholar
H. Kashima, T. Kato, Y. Yamanishi, M. Sugiyama, and K. Tsuda. Link propagation: A fast semi-supervised learning algorithm for link prediction. In 2009 SIAM Conference on Data Mining, pages 1100–1111, 2009.
Google Scholar
H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 21st International Conference on Machine Learning, pages 321–328. AAAI Press, 2003.
Google Scholar
T. Kato, H. Kashima, and M. Sugiyama. Robust label propagation on multiple networks. IEEE Trans. Neural Networks, 20(1):35–44, 2008.
Article Google Scholar
J. Kazius, S. Nijssen, J. Kok, T. Back, and A.P. Ijzerman. Substructure mining using elaborate chemical representation. J. Chem. Inf. Model., 46:597–605, 2006.
Article Google Scholar
R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 1–2:273–324, 1997.
Article Google Scholar
R. I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete input. In ICML 2002, 2002.
Google Scholar
T. Kudo, E. Maeda, and Y. Matsumoto. An application of boosting to graph classification. In Advances in Neural Information Processing Systems 17, pages 729–736. MIT Press, 2005.
Google Scholar
D. G. Luenberger. Optimization by Vector Space Methods. Wiley, 1969.
Google Scholar
P. Mahe, N. Ueda, T. Akutsu, J.-L. Perret, and J.-P. Vert. Graph kernels for molecular structure - activity relationship analysis with support vector machines. J. Chem. Inf. Model., 45:939–951, 2005.
Article Google Scholar
P. Mahe and J.P. Vert. Graph kernels based on tree patterns for molecules. Machine Learning, 75:3–35, 2009.
Article Google Scholar
S. Morishita. Computing optimal hypotheses efficiently for boosting. In Discovery Science, pages 471–481, 2001.
Google Scholar
S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proceedings of ACM SIGACT-SIGMOD-SIGART Symposium on Database Systems (PODS), pages 226–236, 2000.
Google Scholar
S. Mostafavi, D. Ray, D. Warde-Farley, C. Grouios, and Q. Morris. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biology, 9(Suppl. 1):S4, 2008.
Article Google Scholar
S. Nijssen and J.N. Kok. A quickstart in frequent structure mining can make a difference. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 647–652. ACM Press, 2004.
Google Scholar
S. Nowozin, K. Tsuda, T. Uno, T. Kudo, and G. Bakir. Weighted substructure mining for image analysis. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 2007.
Google Scholar
J. Pei, J. Han, B. Mortazavi-asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Mining sequential patterns by pattern-growth: The prefixs-pan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11):1424–1440, 2004.
Article Google Scholar
G. Ratsch, S. Mika, B. Scholkopf, and K.-R. Muller. Constructing boosting algorithms from SVMs: an application to one-class classification. IEEE Trans. Patt. Anal. Mach. Intell., 24(9):1184–1199, 2002.
Article Google Scholar
R. Rosipal and N. Kramer. Overview and recent advances in partial least squares. In Subspace, Latent Structure and Feature Selection Techniques, pages 34–51. Springer, 2006.
Google Scholar
W.J. Rugh. Linear System Theory. Prentice Hall, 1995.
Google Scholar
H. Saigo, N. Kramer, and K. Tsuda. Partial least squares regression for graph mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 578–586, 2008.
Google Scholar
H. Saigo, S. Nowozin, T. Kadowaki, T. Kudo, and K. Tsuda. GBoost: A mathematical programming approach to graph classification and regression. Machine Learning, 2008.
Google Scholar
A. Sanfeliu and K.S. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern., 13:353–362, 1983.
MATH Google Scholar
B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
Google Scholar
H. Shin, A.M. Lisewski, and O. Lichtarge. Graph sharpening plus graph integration: a synergy that improves protein functional classification. Bioinformatics, 23:3217–3224, 2007.
Article Google Scholar
A. Subramanya and J. Bilmes. Soft-supervised learning for text classification. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 1090–1099, 2008.
Google Scholar
K. Tsuda. Entire regularization paths for graph data. In Proceedings of the 24th International Conference on Machine Learning, pages 919–926, 2007.
Google Scholar
K. Tsuda and T. Kudo. Clustering graphs by weighted substructure mining. In Proceedings of the 23rd International Conference on Machine Learning, pages 953–960. ACM Press, 2006.
Google Scholar
K. Tsuda and K. Kurihara. Graph mining with variational dirichlet process mixture models. In SIAM Conference on Data Mining (SDM), 2008.
Google Scholar
K. Tsuda and W.S. Noble. Learning kernels from biological networks by maximizing entropy. Bioinformatics, 20(Suppl. 1):i326–i333, 2004.
Article Google Scholar
K. Tsuda, H.J. Shin, and B. Scholkopf. Fast protein classification with multiple networks. Bioinformatics, 21(Suppl. 2):ii59–ii65, 2005.
Article Google Scholar
S.V.N. Vishwanathan, K.M. Borgwardt, and N.N. Schraudolph. Fast computation of graph kernels. In Advances in Neural Information Processing Systems 19, Cambridge, MA, 2006. MIT Press.
Google Scholar
N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. In Proceedings of the 2006 IEEE International Conference on Data Mining, pages 678–689, 2006.
Google Scholar
X. Yan and J. Han. gSpan: graph-based substructure pattern mining. In Proceedings of the 2002 IEEE International Conference on Data Mining, pages 721–724. IEEE Computer Society, 2002.
Google Scholar
D. Zhou, O. Bousquet, J. Weston, and B. Scholkopf. Learning with local and global consistency. In Advances in Neural Information Processing Systems (NIPS) 16, pages 321–328. MIT Press, 2004.
Google Scholar
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In Proc. of the Twentieth International Conference on Machine Learning (ICML), pages 912–919. AAAI Press, 2003.
Google Scholar

Download references

Author information

Authors and Affiliations

Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
Koji Tsuda
Max Planck Institute for Informatics, Saarbru̶cken, Germany
Hiroto Saigo

Authors

Koji Tsuda
View author publications
You can also search for this author in PubMed Google Scholar
Hiroto Saigo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Koji Tsuda .

Editor information

Editors and Affiliations

Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, U.S.A.
Charu C. Aggarwal
Microsoft Research Asia, Zhichun Road 49, Beijing, 100080, China, People's Republic
Haixun Wang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tsuda, K., Saigo, H. (2010). Graph Classification. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_11

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6045-0_11
Published: 18 January 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6044-3
Online ISBN: 978-1-4419-6045-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics