Skip to main content

Graph Classification Methods in Chemoinformatics

  • Chapter
  • First Online:
Handbook of Statistical Bioinformatics

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

  • 4104 Accesses

Abstract

Graphs are general and powerful data structures that can be used to represent diverse kinds of molecular objects such as chemical compounds, proteins, and RNAs. In recent years, computational analysis of tens of thousands of labeled graphs has become possible by advanced graph mining methods. For example, frequent pattern mining methods such as gSpan can enumerate all frequent subgraphs in a graph database efficiently. This chapter reviews basics of graph mining methodology and its application to chemoinformatics and bioinformatics. Graph classification and regression techniques based on subgraph patterns are also reviewed extensively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of VLDB 1994 (pp. 487–499).

    Google Scholar 

  2. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., & Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In Proceedings of 2nd SIAM data mining conference (SDM) (pp. 158–174).

    Google Scholar 

  3. Boley, M., & Grosskreutz, H. (2008). A randomized approach for approximating the number of frequent sets. In Proceedings of the 8th IEEE international conference on data mining (pp. 43–52).

    Google Scholar 

  4. Borgwardt, K. M., Ong, C. S., Schönauer, S., Vishwanathan, S. V. N., Smola, A. J., & Kriegel, H.-P. (2006). Protein function prediction via graph kernels. Bioinformatics, 21(Suppl. 1), i47–i56.

    Google Scholar 

  5. Cheng, H., Lo, D., Zhou, Y., Wang, X., & Yan, X. (2009). Identifying bug signatures using discriminative graph mining. In Proceedings of the 18th international symposium on software testing and analysis (pp. 141–152).

    Google Scholar 

  6. Demiriz, A., Bennet, K. P., & Shawe-Taylor, J. (2002). Linear programming boosting via column generation. Machine Learning, 46(1–3), 225–254.

    Article  MATH  Google Scholar 

  7. Deshpande, M., Kuramochi, M., Wale, N., & Karypis, G. (2005). Frequent sub-structure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1036–1050.

    Article  Google Scholar 

  8. du Merle, O., Villeneuve, D., Desrosiers, J., & Hansen, P. (1999). Stabilized column generation. Discrete Mathematics, 194, 229–237.

    Article  MathSciNet  MATH  Google Scholar 

  9. Eichinger, F., Böhm, K., & Huber, M. (2008). Mining edge-weighted call graphs to localise software bugs. In Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD) (pp. 333–348).

    Google Scholar 

  10. Gasteiger, J., & Engel, T. (2003). Chemoinformatics: A textbook. Weinheim, Germany: Wiley-VCH.

    Book  Google Scholar 

  11. Guyon, I., Weston, J., Bahnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3), 389–422.

    Article  MATH  Google Scholar 

  12. Hamada, M., Tsuda, K., Kudo, T., Kin, T., & Asai, K. (2006). Mining frequent stem patterns from unaligned RNA sequences. Bioinformatics, 22, 2480–2487.

    Article  Google Scholar 

  13. Han, J., & Kamber, M. (2000). Data mining: Concepts and techniques. San Francisco: Morgan Kaufmann.

    Google Scholar 

  14. Helma, C., Cramer, T., Kramer, S., & Raedt, L. D. (2004). Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. Journal of Chemical Information Computer Science, 44, 1402–1411.

    Google Scholar 

  15. Inokuchi, A. (2005). Mining generalized substructures from a set of labeled graphs. In Proceedings of the 4th IEEE internatinal conference on data mining (pp. 415–418). Los Alamitos, CA: IEEE Computer Society.

    Google Scholar 

  16. Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proceedings of the 21st international conference on machine learning (pp. 321–328). New York: AAAI.

    Google Scholar 

  17. Kazius, J., Nijssen, S., Kok, J., Bäck, T., & Ijzerman, A. P. (2006). Substructure mining using elaborate chemical representation. Journal of Chemical Information Modeling, 46, 597–605.

    Article  Google Scholar 

  18. Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 1–2, 273–324.

    Article  Google Scholar 

  19. Kudo, T., Maeda, E., & Matsumoto, Y. (2005). An application of boosting to graph classification. In Advances in neural information processing systems (Vol. 17, pp. 729–736). Cambridge, MA: MIT.

    Google Scholar 

  20. Luenberger, D. G. (1969). Optimization by vector space methods. New York: Wiley.

    MATH  Google Scholar 

  21. Mahé, P., Ueda, N., Akutsu, T., Perret, J.-L., & Vert, J.-P. (2005). Graph kernels for molecular structure – activity relationship analysis with support vector machines. Journal of Chemical and Information Modeling, 45, 939–951.

    Article  Google Scholar 

  22. Morishita, S. (2001). Computing optimal hypotheses efficiently for boosting. In Discovery science (pp. 471–481).

    Google Scholar 

  23. Morishita, S., & Sese, J. (2000). Traversing itemset lattices with statistical metric pruning. In Proceedings of ACM SIGACT-SIGMOD-SIGART symposium on database systems (PODS) (pp. 226–236).

    Google Scholar 

  24. Nijssen, S., & Kok, J. N. (2004). A quickstart in frequent structure mining can make a difference. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 647–652). New York: ACM Press.

    Google Scholar 

  25. Nowozin, S., Tsuda, K., Uno, T., Kudo, T., & Bakir, G. (2007). Weighted substructure mining for image analysis. In IEEE computer society conference on computer vision and pattern recognition (CVPR). Los Alamitos, CA: IEEE Computer Society.

    Google Scholar 

  26. Pei, J., Han, J., Mortazavi-asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., & Hsu, M. (2004). Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1424–1440.

    Article  Google Scholar 

  27. Rätsch, G., Mika, S., Schölkopf, B., & Müller, K.-R. (2002). Constructing boosting algorithms from SVMs: An application to one-class classification. IEEE Transactions on Pattern Analysis Machine Intelligence, 24(9), 1184–1199.

    Article  Google Scholar 

  28. Rosipal, R., & Krämer, N. (2006). Overview and recent advances in partial least squares. In Subspace, latent structure and feature selection techniques (pp. 34–51). Springer.

    Google Scholar 

  29. Saigo, H., Krämer, N., & Tsuda, K. (2008). Partial least squares regression for graph mining. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 578–586).

    Google Scholar 

  30. Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., & Tsuda, K. (2008). GBoost: A mathematical programming approach to graph classification and regression. Machine Learning.

    Google Scholar 

  31. Sanfeliu, A., & Fu, K. S. (1983). A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on System, Man and Cybernetics, 13, 353–362.

    Article  MATH  Google Scholar 

  32. Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT.

    Google Scholar 

  33. Tsuda, K. (2007). Entire regularization paths for graph data. In Proceedings of the 24th international conference on machine learning (pp. 919–926).

    Google Scholar 

  34. Tsuda, K., & Kudo, T. (2006). Clustering graphs by weighted substructure mining. In Proceedings of the 23rd international conference on machine learning (pp. 953–960). New York: ACM.

    Google Scholar 

  35. Tsuda, K., & Kurihara, K. (2008). Graph mining with variational dirichlet process mixture models. In SIAM Conference on Data Mining (SDM).

    Google Scholar 

  36. Wale, N., & Karypis, G. (2006). Comparison of descriptor spaces for chemical compound retrieval and classification. In Proceedings of the 2006 IEEE international conference on data mining (pp. 678–689).

    Google Scholar 

  37. Yan, X., Cheng, H., Han, J., & Yu, P. S. (2008). Mining significant graph patterns by leap search. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 433–444).

    Google Scholar 

  38. Yan, X., & Han, J. (2002). gSpan: Graph-based substructure pattern mining. In Proceedings of the 2002 IEEE international conference on data mining (pp. 721–724). Los Alamitos, CA: IEEE Computer Society.

    Google Scholar 

  39. Zaki, M., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. In KDD 1997 (pp. 283–286).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Koji Tsuda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Tsuda, K. (2011). Graph Classification Methods in Chemoinformatics. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_16

Download citation

Publish with us

Policies and ethics