Skip to main content

Graph Classification

  • Chapter
  • First Online:
Managing and Mining Graph Data

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

Abstract

Supervised learning on graphs is a central subject in graph data processing. In graph classification and regression, we assume that the target values of a certain number of graphs or a certain part of a graph are available as a training dataset, and our goal is to derive the target values of other graphs or the remaining part of the graph. In drug discovery applications, for example, a graph and its target value correspond to a chemical compound and its chemical activity. In this chapter, we review state-of-the-art methods of graph classification. In particular, we focus on two representative methods, graph kernels and graph boosting, and we present other methods in relation to the two methods. We describe the strengths and weaknesses of different graph classification methods and recent efforts to overcome the challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. VLDB 1994, pages 487–499, 1994.

    Google Scholar 

  2. T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In Proc 2nd SIAM Data Mining Conference (SDM), pages 158–174, 2002.

    Google Scholar 

  3. R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994.

    Google Scholar 

  4. M. Boley and H. Grosskreutz. A randomized approach for approximating the number of frequent sets. In Proceedings of the 8th IEEE International Conference on Data Mining, pages 43–52, 2008.

    Google Scholar 

  5. K. M. Borgwardt, C. S. Ong, S. Schonauer, S. V. N. Vishwanathan, A. J. Smola, and H.-P. Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21(suppl. 1):i47–i56, 2006.

    Google Scholar 

  6. O. Chapelle, A. Zien, and B. Scholkopf, editors. Semi-Supervised Learning. MIT Press, Cambridge, MA, 2006.

    Google Scholar 

  7. T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. MIT Press and McGraw Hill, 1990.

    Google Scholar 

  8. A. Demiriz, K.P. Bennet, and J. Shawe-Taylor. Linear programming boosting via column generation. Machine Learning, 46(1–3):225–254, 2002.

    Article  MATH  Google Scholar 

  9. M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng., 17(8):1036–1050, 2005.

    Article  Google Scholar 

  10. O. du Merle, D. Villeneuve, J. Desrosiers, and P. Hansen. Stabilized column generation. Discrete Mathematics, 194:229–237, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  11. F. Eichinger, K. Bohm, and M. Huber. Mining edge-weighted call graphs to localise software bugs. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pages 333–348, 2008.

    Google Scholar 

  12. T. Gartner, P. Flach, and S. Wrobel. On graph kernels: Hardness results and efficient alternatives. In Proc. of the Sixteenth Annual Conference on Computational Learning Theory, 2003.

    Google Scholar 

  13. I. Guyon, J. Weston, S. Bahnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3):389–422, 2002.

    Article  MATH  Google Scholar 

  14. J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.

    Google Scholar 

  15. Z. Harchaoui and F. Bach. Image classification with segmentation graph kernels. In 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2007.

    Google Scholar 

  16. C. Helma, T. Cramer, S. Kramer, and L.D. Raedt. Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J. Chem. Inf. Comput. Sci., 44:1402–1411, 2004.

    Google Scholar 

  17. T. Horvath, T. Gartner, and S. Wrobel. Cyclic pattern kernels for predictive graph mining. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 158–167, 2004.

    Google Scholar 

  18. A. Inokuchi. Mining generalized substructures from a set of labeled graphs. In Proceedings of the 4th IEEE Internatinal Conference on Data Mining, pages 415–418. IEEE Computer Society, 2005.

    Google Scholar 

  19. Y. Karklin, R.F. Meraz, and S.R. Holbrook. Classification of non-coding rna using graph representations of secondary structure. In Pacific Symposium on Biocomputing, pages 4–15, 2005.

    Google Scholar 

  20. H. Kashima, T. Kato, Y. Yamanishi, M. Sugiyama, and K. Tsuda. Link propagation: A fast semi-supervised learning algorithm for link prediction. In 2009 SIAM Conference on Data Mining, pages 1100–1111, 2009.

    Google Scholar 

  21. H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 21st International Conference on Machine Learning, pages 321–328. AAAI Press, 2003.

    Google Scholar 

  22. T. Kato, H. Kashima, and M. Sugiyama. Robust label propagation on multiple networks. IEEE Trans. Neural Networks, 20(1):35–44, 2008.

    Article  Google Scholar 

  23. J. Kazius, S. Nijssen, J. Kok, T. Back, and A.P. Ijzerman. Substructure mining using elaborate chemical representation. J. Chem. Inf. Model., 46:597–605, 2006.

    Article  Google Scholar 

  24. R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 1–2:273–324, 1997.

    Article  Google Scholar 

  25. R. I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete input. In ICML 2002, 2002.

    Google Scholar 

  26. T. Kudo, E. Maeda, and Y. Matsumoto. An application of boosting to graph classification. In Advances in Neural Information Processing Systems 17, pages 729–736. MIT Press, 2005.

    Google Scholar 

  27. D. G. Luenberger. Optimization by Vector Space Methods. Wiley, 1969.

    Google Scholar 

  28. P. Mahe, N. Ueda, T. Akutsu, J.-L. Perret, and J.-P. Vert. Graph kernels for molecular structure - activity relationship analysis with support vector machines. J. Chem. Inf. Model., 45:939–951, 2005.

    Article  Google Scholar 

  29. P. Mahe and J.P. Vert. Graph kernels based on tree patterns for molecules. Machine Learning, 75:3–35, 2009.

    Article  Google Scholar 

  30. S. Morishita. Computing optimal hypotheses efficiently for boosting. In Discovery Science, pages 471–481, 2001.

    Google Scholar 

  31. S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proceedings of ACM SIGACT-SIGMOD-SIGART Symposium on Database Systems (PODS), pages 226–236, 2000.

    Google Scholar 

  32. S. Mostafavi, D. Ray, D. Warde-Farley, C. Grouios, and Q. Morris. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biology, 9(Suppl. 1):S4, 2008.

    Article  Google Scholar 

  33. S. Nijssen and J.N. Kok. A quickstart in frequent structure mining can make a difference. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 647–652. ACM Press, 2004.

    Google Scholar 

  34. S. Nowozin, K. Tsuda, T. Uno, T. Kudo, and G. Bakir. Weighted substructure mining for image analysis. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 2007.

    Google Scholar 

  35. J. Pei, J. Han, B. Mortazavi-asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Mining sequential patterns by pattern-growth: The prefixs-pan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11):1424–1440, 2004.

    Article  Google Scholar 

  36. G. Ratsch, S. Mika, B. Scholkopf, and K.-R. Muller. Constructing boosting algorithms from SVMs: an application to one-class classification. IEEE Trans. Patt. Anal. Mach. Intell., 24(9):1184–1199, 2002.

    Article  Google Scholar 

  37. R. Rosipal and N. Kramer. Overview and recent advances in partial least squares. In Subspace, Latent Structure and Feature Selection Techniques, pages 34–51. Springer, 2006.

    Google Scholar 

  38. W.J. Rugh. Linear System Theory. Prentice Hall, 1995.

    Google Scholar 

  39. H. Saigo, N. Kramer, and K. Tsuda. Partial least squares regression for graph mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 578–586, 2008.

    Google Scholar 

  40. H. Saigo, S. Nowozin, T. Kadowaki, T. Kudo, and K. Tsuda. GBoost: A mathematical programming approach to graph classification and regression. Machine Learning, 2008.

    Google Scholar 

  41. A. Sanfeliu and K.S. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern., 13:353–362, 1983.

    MATH  Google Scholar 

  42. B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.

    Google Scholar 

  43. H. Shin, A.M. Lisewski, and O. Lichtarge. Graph sharpening plus graph integration: a synergy that improves protein functional classification. Bioinformatics, 23:3217–3224, 2007.

    Article  Google Scholar 

  44. A. Subramanya and J. Bilmes. Soft-supervised learning for text classification. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 1090–1099, 2008.

    Google Scholar 

  45. K. Tsuda. Entire regularization paths for graph data. In Proceedings of the 24th International Conference on Machine Learning, pages 919–926, 2007.

    Google Scholar 

  46. K. Tsuda and T. Kudo. Clustering graphs by weighted substructure mining. In Proceedings of the 23rd International Conference on Machine Learning, pages 953–960. ACM Press, 2006.

    Google Scholar 

  47. K. Tsuda and K. Kurihara. Graph mining with variational dirichlet process mixture models. In SIAM Conference on Data Mining (SDM), 2008.

    Google Scholar 

  48. K. Tsuda and W.S. Noble. Learning kernels from biological networks by maximizing entropy. Bioinformatics, 20(Suppl. 1):i326–i333, 2004.

    Article  Google Scholar 

  49. K. Tsuda, H.J. Shin, and B. Scholkopf. Fast protein classification with multiple networks. Bioinformatics, 21(Suppl. 2):ii59–ii65, 2005.

    Article  Google Scholar 

  50. S.V.N. Vishwanathan, K.M. Borgwardt, and N.N. Schraudolph. Fast computation of graph kernels. In Advances in Neural Information Processing Systems 19, Cambridge, MA, 2006. MIT Press.

    Google Scholar 

  51. N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. In Proceedings of the 2006 IEEE International Conference on Data Mining, pages 678–689, 2006.

    Google Scholar 

  52. X. Yan and J. Han. gSpan: graph-based substructure pattern mining. In Proceedings of the 2002 IEEE International Conference on Data Mining, pages 721–724. IEEE Computer Society, 2002.

    Google Scholar 

  53. D. Zhou, O. Bousquet, J. Weston, and B. Scholkopf. Learning with local and global consistency. In Advances in Neural Information Processing Systems (NIPS) 16, pages 321–328. MIT Press, 2004.

    Google Scholar 

  54. X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In Proc. of the Twentieth International Conference on Machine Learning (ICML), pages 912–919. AAAI Press, 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Koji Tsuda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag US

About this chapter

Cite this chapter

Tsuda, K., Saigo, H. (2010). Graph Classification. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-6045-0_11

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-6044-3

  • Online ISBN: 978-1-4419-6045-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics