Skip to main content

A Survey of Graph Mining Techniques for Biological Datasets

  • Chapter
  • First Online:
Managing and Mining Graph Data

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

Abstract

Mining structured information has been the source of much research in the data mining community over the last decade. The field of bioinformatics has emerged as important application area in this context. Examples abound ranging from the analysis of protein interaction networks to the analysis of phylogenetic data. In this article we survey the principal results in the field examining them both from the algorithmic contributions and applicability in the domain in ques- tion. We conclude this article with a discussion of the key results and identify some interesting directions for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akutsu, T. (1992). An RNC algorithm for finding a largest common subtree of two trees. IEICE Transactions on Information and Systems, 75(1):95–101.

    Google Scholar 

  2. Aoki, K., Mamitsuka, H., Akutsu, T., and Kanehisa, M. (2005). A score matrix to reveal the hidden links in glycans. Bioinformatics, 21(8):1457–1463.

    Article  Google Scholar 

  3. Aoki, K., Ueda, N., Yamaguchi, A., Kanehisa, M., Akutsu, T., and Mamitsuka, H. (2004a). Application of a new probabilistic model for recognizing complex patterns in glycans.

    Google Scholar 

  4. Aoki, K., Yamaguchi, A., Okuno, Y., Akutsu, T., Ueda, N., Kanehisa, M., and Mamitsuka, H. (2003). Efficient tree-matching methods for accurate carbohydrate database queries. Genome Informatics Sl, pages 134–143.

    Google Scholar 

  5. Aoki, K., Yamaguchi, A., Ueda, N., Akutsu, T., Mamitsuka, H., Goto, S., and Kanehisa, M. (2004b). KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains. Nucleic acids research, 32(Web Server Issue):W267.

    Article  Google Scholar 

  6. Asur, S., Ucar, D., and Parthasarathy, S. (2007). An ensemble framework for clustering protein protein interaction networks. Bioinformatics, 23(13):i29.

    Article  Google Scholar 

  7. Avogadri, R. and Valentini, G. (2009). Fuzzy ensemble clustering based on random projections for DNA microarray data analysis. Artificial Intelligence in Medicine, 45(2–3):173–183.

    Article  Google Scholar 

  8. Bader, G. and Hogue, C. (2003). An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinfomatics, 4:2.

    Article  Google Scholar 

  9. Bafna, V., Muthukrishnan, S., and Ravi, R. (1995). Computing similarity between RNA strings. In Combinatorial Pattern Matching (CPM), volume 937 of LNCS.

    Google Scholar 

  10. Bar-Joseph, Z., Gerber, G., Lee, T., Rinaldi, N., Yoo, J., Robert, F., Gordon, D., Fraenkel, E., Jaakkola, T., Young, R., et al. (2003). Computational discovery of gene modules and regulatory networks. Nature Biotechnology, 21(11):1337–1342.

    Article  Google Scholar 

  11. Benedetti, G. and Morosetti, S. (1996). A graph-topological approach to recognition of pattern and similarity in RNA secondary structures. Biophysical chemistry, 59(1–2):179–184.

    Article  Google Scholar 

  12. Bille, P. (2005). A survey on tree edit distance and related problems. Theoretical computer science, 337(1–3):217–239.

    Article  MATH  MathSciNet  Google Scholar 

  13. Bohne-Lang, A., Lang, E., Forster, T., and von der Lieth, C. (2001). LIN-UCS: linear notation for unique description of carbohydrate sequences. Carbohydrate research, 336(1):1–11.

    Article  Google Scholar 

  14. Brohee, S. and van Helden, J. (2006). Evaluation of clustering algorithms for protein-protein interaction networks. BMC bioinformatics, 7(1):488.

    Article  Google Scholar 

  15. Butte, A. and Kohane, I. (2000). Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In Pac Symp Biocomput, volume 5, pages 418–429.

    Google Scholar 

  16. Chakrabarti, D. and Faloutsos, C. (2006). Graph mining: Laws, generators, and algorithms. ACM Computing Surveys (CSUR), 38(1).

    Google Scholar 

  17. Chawathe, S. and Garcia-Molina, H. (1997). Meaningful change detection in structured data. ACM SIGMOD Record, 26(2):26–37.

    Article  Google Scholar 

  18. Chawathe, S., Rajaraman, A., Garcia-Molina, H., and Widom, J. (1996). Change detection in hierarchically structured information. In Proceedings of the 1996 ACM SIGMOD international conference on Management of data, pages 493–504. ACM New York, NY, USA.

    Chapter  Google Scholar 

  19. Chen, J., Hsu, W., Lee, M., and Ng, S. (2006). NeMoFinder: Dissecting genome-wide protein-protein interactions with meso-scale network motifs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 106–115. ACM New York, NY, USA.

    Chapter  Google Scholar 

  20. Chen, J., Hsu, W., Lee, M. L., and Ng, S.-K. (2007). Labeling network motifs in protein interactomes for protein function prediction. Data Engineering, International Conference on, 0:546–555.

    Google Scholar 

  21. Cheng, Y. and Church, G. (2000). Biclustering of expression data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology table of contents, pages 93–103. AAAI Press.

    Google Scholar 

  22. Chua, H., Ning, K., Sung, W., Leong, H., and Wong, L. (2007). Using indirect protein-protein interactions for protein complex prediction. In Computational Systems Bioinformatics: Proceedings of the CSB 2007 Conference, page 97. Imperial College Press.

    Google Scholar 

  23. Coatney, M. and Parthasarathy, S. (2005a). MotifMiner: Efficient discovery of common substructures in biochemical molecules. Knowledge and Information Systems, 7(2):202–223.

    Article  Google Scholar 

  24. Coatney, M. and Parthasarathy, S. (2005b). Motifminer: Efficient discovery of common substructures in biochemical molecules. Knowl. Inf. Syst., 7(2):202–223.

    Article  Google Scholar 

  25. Constantinescu, M. and Sankoff, D. (1995). An efficient algorithm for supertrees. Journal of Classification, 12(1):101–112.

    Article  MATH  Google Scholar 

  26. Cooper, C., Harrison, M., Wilkins, M., and Packer, N. (2001). GlycoSuiteDB: a new curated relational database of glycoprotein glycan structures and their biological sources. Nucleic Acids Research, 29(1):332.

    Article  Google Scholar 

  27. Dhillon, I., Guan, Y., and Kulis, B. (2005). A fast kernel-based multilevel algorithm for graph clustering. Proceedings of the 11th ACM SIGKDD, pages 629–634.

    Google Scholar 

  28. Dongen, S. (2000). Graph clustering by flow simulation. PhD thesis, PhD Thesis, University of Utrecht, The Netherlands.

    Google Scholar 

  29. Durocher, D., Taylor, I., Sarbassova, D., Haire, L., Westcott, S., Jackson, S., Smerdon, S., and Yaffe, M. (2000). The molecular basis of FHA domain: phosphopeptide binding specificity and implications for phospho-dependent signaling mechanisms. Molecular Cell, 6(5):1169–1182.

    Article  Google Scholar 

  30. Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences, 95(25):14863–14868.

    Article  Google Scholar 

  31. Farach, M. and Thorup, M. (1994). Fast comparison of evolutionary trees. In Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, pages 481–488. Society for Industrial and Applied Mathematics Philadelphia, PA, USA.

    Google Scholar 

  32. Fitch, W. (1971). Toward defining the course of evolution: minimum change for a specific tree topology. Systematic zoology, 20(4):406–416.

    Article  Google Scholar 

  33. Furtig, B., Richter, C., Wohnert, J., and Schwalbe, H. (2003). NMR spectroscopy of RNA. ChemBioChem, 4(10):936–962.

    Article  Google Scholar 

  34. Gan, H., Pasquali, S., and Schlick, T. (2003). Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. Nucleic acids research, 31(11):2926.

    Article  Google Scholar 

  35. Gardner, P. and Giegerich, R. (2004). A comprehensive comparison of comparative RNA structure prediction approaches. BMC bioinformatics, 5(1):140.

    Article  Google Scholar 

  36. Gordon, A. (1979). A measure of the agreement between rankings. Biometrika, 66(1):7–15.

    Article  MATH  MathSciNet  Google Scholar 

  37. Gordon, A. (1986). Consensus supertrees: the synthesis of rooted trees containing overlapping sets of labeled leaves. Journal of Classification, 3(2):335–348.

    Article  MATH  MathSciNet  Google Scholar 

  38. Gouda, K. and Zaki, M. (2001). Efficiently mining maximal frequent itemsets. In Proceedings of the 2001 IEEE International Conference on Data Mining, pages 163–170.

    Google Scholar 

  39. Grochow, J. and Kellis, M. (2007). Network motif discovery using subgraph enumeration and symmetry-breaking. Lecture Notes in Computer Science, 4453:92.

    Article  Google Scholar 

  40. Guignon, V., Chauve, C., and Hamel, S. (2005). An edit distance between RNA stem-loops. Lecture notes in computer science, 3772:333.

    MathSciNet  Google Scholar 

  41. Gupta, A. and Nishimura, N. (1998). Finding largest subtrees and smallest supertrees. Algorithmica, 21(2):183–210.

    Article  MATH  MathSciNet  Google Scholar 

  42. Hadzic, F., Dillon, T., Sidhu, A., Chang, E., and Tan, H. (2006). Mining substructures in protein data. In IEEE ICDM 2006 Workshop on Data Mining in Bioinformatics (DMB 2006), pages 18–22.

    Google Scholar 

  43. Hartuv, E. and Shamir, R. (2000). A clustering algorithm based on graph connectivity. Information processing letters, 76(4–6):175–181.

    Article  MATH  MathSciNet  Google Scholar 

  44. Hashimoto, K., Aoki-Kinoshita, K., Ueda, N., Kanehisa, M., and Mamitsuka, H. (2006a). A new efficient probabilistic model for mining labeled ordered trees. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 177–186.

    Google Scholar 

  45. Hashimoto, K., Goto, S., Kawano, S., Aoki-Kinoshita, K., Ueda, N., Hamajima, M., Kawasaki, T., and Kanehisa, M. (2006b). KEGG as a glycome informatics resource. Glycobiology, 16(5):63–70.

    Article  Google Scholar 

  46. Hashimoto, K., Takigawa, I., Shiga, M., Kanehisa, M., and Mamitsuka, H. (2008). Mining significant tree patterns in carbohydrate sugar chains. Bioinformatics, 24(16):i167.

    Article  Google Scholar 

  47. Henikoff, S. and Henikoff, J. (1992). Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, 89(22):10915–10919.

    Article  Google Scholar 

  48. Herget, S., Ranzinger, R., Maass, K., and Lieth, C. (2008). GlycoCT: a unifying sequence format for carbohydrates. Carbohydrate Research, 343(12):2162–2171.

    Article  Google Scholar 

  49. Hizukuri, Y., Yamanishi, Y., Nakamura, O., Yagi, F., Goto, S., and Kanehisa, M. (2005). Extraction of leukemia specific glycan motifs in humans by computational glycomics. Carbohydrate research, 340(14):2270–2278.

    Article  Google Scholar 

  50. Hochsmann, M., Voss, B., and Giegerich, R. (2004). Pure multiple RNA secondary structure alignments: a progressive profile approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 1(1):53–62.

    Article  Google Scholar 

  51. Holder, L., Cook, D., and Djoko, S. (1994). Substructure discovery in the subdue system. In Proc. of the AAAI Workshop on Knowledge Discovery in Databases, pages 169–180.

    Google Scholar 

  52. Horvath, S. and Dong, J. (2008). Geometric interpretation of gene coexpression network analysis. PLoS Computational Biology, 4(8).

    Google Scholar 

  53. Hu, H., Yan, X., Huang, Y., Han, J., and Zhou, X. (2005). Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(1):213–221.

    Article  Google Scholar 

  54. Huan, J., Wang, W., Prins, J., and Yang, J. (2004). Spin: mining maximal frequent subgraphs from graph databases. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 581–586. ACM New York, NY, USA.

    Chapter  Google Scholar 

  55. Huang, Y., Li, H., Hu, H., Yan, X., Waterman, M., Huang, H., and Zhou, X. (2007). Systematic discovery of functional modules and context-specific functional annotation of human genome. Bioinformatics, 23(13):i222.

    Article  Google Scholar 

  56. Jiang, T., Lawler, E., and Wang, L. (1994). Aligning sequences via an evolutionary tree: complexity and approximation. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pages 760–769. ACM New York, NY, USA.

    Chapter  Google Scholar 

  57. Jiang, T., Lin, G., Ma, B., and Zhang, K. (2002). A general edit distance between RNA structures. Journal of Computational Biology, 9(2):371–388.

    Article  Google Scholar 

  58. Jiang, T., Wang, L., and Zhang, K. (1995). Alignment of trees: an alternative to tree edit. Theoretical Computer Science, 143(1):137–148.

    Article  MATH  MathSciNet  Google Scholar 

  59. Jin, R., Wang, C., Polshakov, D., Parthasarathy, S., and Agrawal, G. (2005). Discovering frequent topological structures from graph datasets. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 606–611. ACM New York, NY, USA.

    Chapter  Google Scholar 

  60. Karypis, G. and Kumar, V. (1999). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359.

    Article  MATH  MathSciNet  Google Scholar 

  61. Kashtan, N., Itzkovitz, S., Milo, R., and Alon, U. (2004). Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics, 20(11):1746–1758.

    Article  Google Scholar 

  62. Kawano, S., Hashimoto, K., Miyama, T., Goto, S., and Kanehisa, M. (2005). Prediction of glycan structures from gene expression data based on glycosyltransferase reactions. Bioinformatics, 21(21):3976–3982.

    Article  Google Scholar 

  63. Keselman, D. and Amir, A. (1994). Maximum agreement subtree in a set of evolutionary trees-metrics and efficient algorithms. In Annual Symposium on Foundations of Computer Science, volume 35, pages 758–758. IEEE Computer Society Press.

    Google Scholar 

  64. Khanna, S., Motwani, R., and Yao, F. (1995). Approximation algorithms for the largest common subtree problem.

    Google Scholar 

  65. Kohonen, T. (1995). Self-organizing maps. Springer, Berlin.

    Google Scholar 

  66. Koyuturk, M., Grama, A., and Szpankowski, W. (2004a). An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics, 20(90001).

    Google Scholar 

  67. Koyuturk, M., Szpankowski, W., and Grama, A. (2004b). Biclustering gene-feature matrices for statistically significant dense patterns. In 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings, pages 480–484.

    Google Scholar 

  68. Kuboyama, T., Hirata, K., Aoki-Kinoshita, K., Kashima, H., and Yasuda, H. (2006). A gram distribution kernel applied to glycan classification and motif extraction. Genome Informatics Series, 17(2):25.

    Google Scholar 

  69. Le, S., Owens, J., Nussinov, R., Chen, J., Shapiro, B., and Maizel, J. (1989). RNA secondary structures: comparison and determination of frequently recurring substructures by consensus. Bioinformatics, 5(3):205–210.

    Article  Google Scholar 

  70. Lee, H., Hsu, A., Sajdak, J., Qin, J., and Pavlidis, P. (2004). Coexpression analysis of human genes across many microarray data sets. Genome Research, 14(6):1085–1094.

    Article  Google Scholar 

  71. Lemmens, K., Dhollander, T., De Bie, T., Monsieurs, P., Engelen, K., Smets, B., Winderickx, J., De Moor, B., and Marchal, K. (2006). Inferring transcriptional modules from ChIP-chip, motif and microarray data. Genome biology, 7(5):R37.

    Article  Google Scholar 

  72. Li, H., Marsolo, K., Parthasarathy, S., and Polshakov, D. (2004). A new approach to protein structure mining and alignment. Proceedings of the ACM SIGKDD Workshop on Data Mining and Bioinformatics (BIOKDD), pages 1–10.

    Google Scholar 

  73. Li, X., Foo, C., and Ng, S. (2007). Discovering protein complexes in dense reliable neighborhoods of protein interaction networks. In Computational Systems Bioinformatics: Proceedings of the CSB 2007 Conference, page 157. Imperial College Press.

    Google Scholar 

  74. Liu, N. and Wang, T. (2006). A method for rapid similarity analysis of RNA secondary structures. BMC bioinformatics, 7(1):493.

    Article  Google Scholar 

  75. Loß, A., Bunsmann, P., Bohne, A., Loß, A., Schwarzer, E., Lang, E., and Von der Lieth, C. (2002). SWEET-DB: an attempt to create annotated data collections for carbohydrates. Nucleic acids research, 30(1):405–408.

    Article  Google Scholar 

  76. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pages 281–297.

    Google Scholar 

  77. Margush, T. and McMorris, F. (1981). Consensusn-trees. Bulletin of Mathematical Biology, 43(2):239–244.

    MATH  MathSciNet  Google Scholar 

  78. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., and Alon, U. (2002). Network motifs: simple building blocks of complex networks. Science, 298(5594):824–827.

    Article  Google Scholar 

  79. Mitchell, J., Cheng, J., and Collins, K. (1999). A box H/ACA small nucleolar RNA-like domain at the human telomerase RNA 3’end. Molecular and cellular biology, 19(1):567–576.

    Google Scholar 

  80. Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69:026113.

    Article  Google Scholar 

  81. Ohtsubo, K. and Marth, J. (2006). Glycosylation in cellular mechanisms of health and disease. Cell, 126(5):855–867.

    Article  Google Scholar 

  82. Onoa, B. and Tinoco, I. (2004). RNA folding and unfolding. Current Opinion in Structural Biology, 14(3):374–379.

    Article  Google Scholar 

  83. Packer, N., von der Lieth, C., Aoki-Kinoshita, K., Lebrilla, C., Paulson, J., Raman, R., Rudd, P., Sasisekharan, R., Taniguchi, N., and York, W. (2008). Frontiers in glycomics: Bioinformatics and biomarkers in disease. Proteomics, 8(1).

    Google Scholar 

  84. Pizzuti, C. and Rombo, S. (2008). Multi-functional protein clustering in ppi networks. In Bioinformatics Research and Development, pages 318–330.

    Google Scholar 

  85. Ragan, M. (1992). Phylogenetic inference based on matrix representation of trees. Molecular Phylogenetics and Evolution, 1(1):53.

    Article  Google Scholar 

  86. Ravasz, E., Somera, A., Mongru, D., Oltvai, Z., and Barabasi, A. (2002). Hierarchical organization of modularity in metabolic networks.

    Google Scholar 

  87. Sahoo, S., Thomas, C., Sheth, A., Henson, C., and York, W. (2005). GLYDE-an expressive XML standard for the representation of glycan structure. Carbohydrate research, 340(18):2802–2807.

    Article  Google Scholar 

  88. Sanderson, M., Purvis, A., and Henze, C. (1998). Phylogenetic supertrees: assembling the trees of life. Trends in Ecology & Evolution, 13(3):105–109.

    Article  Google Scholar 

  89. Satuluri, V. and Parthasarathy, S. (2009). Scalable Graph Clustering using Stochastic Flows: Applications to Community Discovery. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 737–746.

    Google Scholar 

  90. Scholkopf, B. and Smola, A. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT press.

    Google Scholar 

  91. Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., and Friedman, N. (2003). Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature genetics, 34(2):166–176.

    Article  Google Scholar 

  92. Selkow, S. (1977). The tree-to-tree editing problem. Information processing letters, 6(6):184–186.

    Article  MATH  MathSciNet  Google Scholar 

  93. Shapiro, B. and Zhang, K. (1990). Comparing multiple RNA secondary structures using tree comparisons. Bioinformatics, 6(4):309–318.

    Article  Google Scholar 

  94. Sharan, R. and Shamir, R. (2000). CLICK: A clustering algorithm with applications to gene expression analysis. 8:307–316.

    Google Scholar 

  95. Shasha, D., Wang, J., and Zhang, S. (2004). Unordered tree mining with applications to phylogeny. In in Proceedings of International Conference on Data Engineering, pages 708–719.

    Google Scholar 

  96. Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence, 22(8):888–905.

    Article  Google Scholar 

  97. Shih, F. and Mitchell, O. (1989). Threshold decomposition of gray-scale morphology into binarymorphology. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(1):31–42.

    Article  MATH  Google Scholar 

  98. Smith, T. and Waterman, M. (1981). Identification of common molecular subsequences. J. Mol. Bwl, 147:195–197.

    Article  Google Scholar 

  99. Sneath, S. (1973). Hierarchical clustering.

    Google Scholar 

  100. Stark, C., Breitkreutz, B., Reguly, T., Boucher, L., Breitkreutz, A., and Tyers, M. (2006). BioGRID: a general repository for interaction datasets. Nucleic acids research, 34(Database Issue):D535.

    Article  Google Scholar 

  101. Stockham, C., Wang, L., and Warnow, T. (2002). Statistically based postprocessing of phylogenetic analysis by clustering. Bioinformatics, 18(3):465–469.

    Article  Google Scholar 

  102. Stuart, J., Segal, E., Koller, D., and Kim, S. (2003a). A gene-coexpression network for global discovery of conserved genetic modules. Science, 302(5643):249–255.

    Article  Google Scholar 

  103. Stuart, J., Segal, E., Koller, D., and Kim, S. (2003b). A gene-coexpression network for global discovery of conserved genetic modules. Science, 302(5643):249–255.

    Article  Google Scholar 

  104. Tai, K. (1979). The tree-to-tree correction problem. Journal of the Association for Computing Machm© ry, 26(3):422–433.

    MATH  MathSciNet  Google Scholar 

  105. Tanay, A., Sharan, R., Kupiec, M., and Shamir, R. (2004). Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proceedings of the National Academy of Sciences, 101(9):2981–2986.

    Article  Google Scholar 

  106. Tanay, A., Sharan, R., and Shamir, R. (2002). Discovering statistically significant biclusters in gene expression data. Bioinformatics, 18(Suppl 1):S136–S144.

    Google Scholar 

  107. Tinoco, I. and Bustamante, C. (1999). How RNA folds. Journal of molecular biology, 293(2):271–281.

    Article  Google Scholar 

  108. Ueda, N., Aoki, K., and Mamitsuka, H. (2004). A general probabilistic framework for mining labeled ordered trees. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 357–368.

    Google Scholar 

  109. Ueda, N., Aoki-Kinoshita, K., Yamaguchi, A., Akutsu, T., and Mamitsuka, H. (2005). A probabilistic model for mining labeled ordered trees: Capturing patterns in carbohydrate sugar chains. IEEE Transactions on Knowledge and Data Engineering, 17(8):1051–1064.

    Article  Google Scholar 

  110. Valiente, G. (2002). Algorithms on trees and graphs. Springer.

    Google Scholar 

  111. Wang, C. and Parthasarathy, S. (2004). Parallel algorithms for mining frequent structural motifs in scientific data. In Proceedings of the 18th annual international conference on Supercomputing, pages 31–40. ACM New York, NY, USA.

    Chapter  Google Scholar 

  112. Wang, L., Jiang, T., and Gusfield, D. (1997). A more efficient approximation scheme for tree alignment. In Proceedings of the first annual international conference on Computational molecular biology, pages 310–319. ACM New York, NY, USA.

    Google Scholar 

  113. Wang, L., Jiang, T., and Lawler, E. (1996). Approximation algorithms for tree alignment with a given phylogeny. Algorithmica, 16(3):302–315.

    Article  MathSciNet  Google Scholar 

  114. Yamanishi, Y., Bach, F., and Vert, J. (2007). Glycan classification with tree kernels. Bioinformatics, 23(10):1211.

    Article  Google Scholar 

  115. Yan, X., Mehan, M., Huang, Y., Waterman, M., Yu, P., and Zhou, X. (2007). A graph-based approach to systematically reconstruct human tran-scriptional regulatory modules. Bioinformatics, 23(13):i577.

    Article  Google Scholar 

  116. You, C. H., Holder, L. B., and Cook, D. J. (2006). Application of graph-based data mining to metabolic pathways. Data Mining Workshops, International Conference on, 0:169–173.

    Article  Google Scholar 

  117. Zaki, M. (2005). Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17(8):1021–1035.

    Article  Google Scholar 

  118. Zhang, K. and Jiang, T. (1994). Some MAX SNP-hard results concerning unordered labeled trees. Information Processing Letters, 49(5):249–254.

    Article  MATH  MathSciNet  Google Scholar 

  119. Zhang, K. and Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM journal on computing, 18:1245.

    Article  MATH  MathSciNet  Google Scholar 

  120. Zhang, S. and Wang, T. (2008). Discovering Frequent Agreement Subtrees from Phylogenetic Data. IEEE Transactions on Knowledge and Data Engineering, 20(1):68–82.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Parthasarathy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag US

About this chapter

Cite this chapter

Parthasarathy, S., Tatikonda, S., Ucar, D. (2010). A Survey of Graph Mining Techniques for Biological Datasets. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-6045-0_18

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-6044-3

  • Online ISBN: 978-1-4419-6045-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics