A Survey of Graph Mining Techniques for Biological Datasets

Parthasarathy, S.; Tatikonda, S.; Ucar, D.

doi:10.1007/978-1-4419-6045-0_18

S. Parthasarathy³,
S. Tatikonda³ &
D. Ucar³

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

7554 Accesses
10 Citations

Abstract

Mining structured information has been the source of much research in the data mining community over the last decade. The field of bioinformatics has emerged as important application area in this context. Examples abound ranging from the analysis of protein interaction networks to the analysis of phylogenetic data. In this article we survey the principal results in the field examining them both from the algorithmic contributions and applicability in the domain in ques- tion. We conclude this article with a discussion of the key results and identify some interesting directions for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Akutsu, T. (1992). An RNC algorithm for finding a largest common subtree of two trees. IEICE Transactions on Information and Systems, 75(1):95–101.
Google Scholar
Aoki, K., Mamitsuka, H., Akutsu, T., and Kanehisa, M. (2005). A score matrix to reveal the hidden links in glycans. Bioinformatics, 21(8):1457–1463.
Article Google Scholar
Aoki, K., Ueda, N., Yamaguchi, A., Kanehisa, M., Akutsu, T., and Mamitsuka, H. (2004a). Application of a new probabilistic model for recognizing complex patterns in glycans.
Google Scholar
Aoki, K., Yamaguchi, A., Okuno, Y., Akutsu, T., Ueda, N., Kanehisa, M., and Mamitsuka, H. (2003). Efficient tree-matching methods for accurate carbohydrate database queries. Genome Informatics Sl, pages 134–143.
Google Scholar
Aoki, K., Yamaguchi, A., Ueda, N., Akutsu, T., Mamitsuka, H., Goto, S., and Kanehisa, M. (2004b). KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains. Nucleic acids research, 32(Web Server Issue):W267.
Article Google Scholar
Asur, S., Ucar, D., and Parthasarathy, S. (2007). An ensemble framework for clustering protein protein interaction networks. Bioinformatics, 23(13):i29.
Article Google Scholar
Avogadri, R. and Valentini, G. (2009). Fuzzy ensemble clustering based on random projections for DNA microarray data analysis. Artificial Intelligence in Medicine, 45(2–3):173–183.
Article Google Scholar
Bader, G. and Hogue, C. (2003). An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinfomatics, 4:2.
Article Google Scholar
Bafna, V., Muthukrishnan, S., and Ravi, R. (1995). Computing similarity between RNA strings. In Combinatorial Pattern Matching (CPM), volume 937 of LNCS.
Google Scholar
Bar-Joseph, Z., Gerber, G., Lee, T., Rinaldi, N., Yoo, J., Robert, F., Gordon, D., Fraenkel, E., Jaakkola, T., Young, R., et al. (2003). Computational discovery of gene modules and regulatory networks. Nature Biotechnology, 21(11):1337–1342.
Article Google Scholar
Benedetti, G. and Morosetti, S. (1996). A graph-topological approach to recognition of pattern and similarity in RNA secondary structures. Biophysical chemistry, 59(1–2):179–184.
Article Google Scholar
Bille, P. (2005). A survey on tree edit distance and related problems. Theoretical computer science, 337(1–3):217–239.
Article MATH MathSciNet Google Scholar
Bohne-Lang, A., Lang, E., Forster, T., and von der Lieth, C. (2001). LIN-UCS: linear notation for unique description of carbohydrate sequences. Carbohydrate research, 336(1):1–11.
Article Google Scholar
Brohee, S. and van Helden, J. (2006). Evaluation of clustering algorithms for protein-protein interaction networks. BMC bioinformatics, 7(1):488.
Article Google Scholar
Butte, A. and Kohane, I. (2000). Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In Pac Symp Biocomput, volume 5, pages 418–429.
Google Scholar
Chakrabarti, D. and Faloutsos, C. (2006). Graph mining: Laws, generators, and algorithms. ACM Computing Surveys (CSUR), 38(1).
Google Scholar
Chawathe, S. and Garcia-Molina, H. (1997). Meaningful change detection in structured data. ACM SIGMOD Record, 26(2):26–37.
Article Google Scholar
Chawathe, S., Rajaraman, A., Garcia-Molina, H., and Widom, J. (1996). Change detection in hierarchically structured information. In Proceedings of the 1996 ACM SIGMOD international conference on Management of data, pages 493–504. ACM New York, NY, USA.
Chapter Google Scholar
Chen, J., Hsu, W., Lee, M., and Ng, S. (2006). NeMoFinder: Dissecting genome-wide protein-protein interactions with meso-scale network motifs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 106–115. ACM New York, NY, USA.
Chapter Google Scholar
Chen, J., Hsu, W., Lee, M. L., and Ng, S.-K. (2007). Labeling network motifs in protein interactomes for protein function prediction. Data Engineering, International Conference on, 0:546–555.
Google Scholar
Cheng, Y. and Church, G. (2000). Biclustering of expression data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology table of contents, pages 93–103. AAAI Press.
Google Scholar
Chua, H., Ning, K., Sung, W., Leong, H., and Wong, L. (2007). Using indirect protein-protein interactions for protein complex prediction. In Computational Systems Bioinformatics: Proceedings of the CSB 2007 Conference, page 97. Imperial College Press.
Google Scholar
Coatney, M. and Parthasarathy, S. (2005a). MotifMiner: Efficient discovery of common substructures in biochemical molecules. Knowledge and Information Systems, 7(2):202–223.
Article Google Scholar
Coatney, M. and Parthasarathy, S. (2005b). Motifminer: Efficient discovery of common substructures in biochemical molecules. Knowl. Inf. Syst., 7(2):202–223.
Article Google Scholar
Constantinescu, M. and Sankoff, D. (1995). An efficient algorithm for supertrees. Journal of Classification, 12(1):101–112.
Article MATH Google Scholar
Cooper, C., Harrison, M., Wilkins, M., and Packer, N. (2001). GlycoSuiteDB: a new curated relational database of glycoprotein glycan structures and their biological sources. Nucleic Acids Research, 29(1):332.
Article Google Scholar
Dhillon, I., Guan, Y., and Kulis, B. (2005). A fast kernel-based multilevel algorithm for graph clustering. Proceedings of the 11th ACM SIGKDD, pages 629–634.
Google Scholar
Dongen, S. (2000). Graph clustering by flow simulation. PhD thesis, PhD Thesis, University of Utrecht, The Netherlands.
Google Scholar
Durocher, D., Taylor, I., Sarbassova, D., Haire, L., Westcott, S., Jackson, S., Smerdon, S., and Yaffe, M. (2000). The molecular basis of FHA domain: phosphopeptide binding specificity and implications for phospho-dependent signaling mechanisms. Molecular Cell, 6(5):1169–1182.
Article Google Scholar
Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences, 95(25):14863–14868.
Article Google Scholar
Farach, M. and Thorup, M. (1994). Fast comparison of evolutionary trees. In Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, pages 481–488. Society for Industrial and Applied Mathematics Philadelphia, PA, USA.
Google Scholar
Fitch, W. (1971). Toward defining the course of evolution: minimum change for a specific tree topology. Systematic zoology, 20(4):406–416.
Article Google Scholar
Furtig, B., Richter, C., Wohnert, J., and Schwalbe, H. (2003). NMR spectroscopy of RNA. ChemBioChem, 4(10):936–962.
Article Google Scholar
Gan, H., Pasquali, S., and Schlick, T. (2003). Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. Nucleic acids research, 31(11):2926.
Article Google Scholar
Gardner, P. and Giegerich, R. (2004). A comprehensive comparison of comparative RNA structure prediction approaches. BMC bioinformatics, 5(1):140.
Article Google Scholar
Gordon, A. (1979). A measure of the agreement between rankings. Biometrika, 66(1):7–15.
Article MATH MathSciNet Google Scholar
Gordon, A. (1986). Consensus supertrees: the synthesis of rooted trees containing overlapping sets of labeled leaves. Journal of Classification, 3(2):335–348.
Article MATH MathSciNet Google Scholar
Gouda, K. and Zaki, M. (2001). Efficiently mining maximal frequent itemsets. In Proceedings of the 2001 IEEE International Conference on Data Mining, pages 163–170.
Google Scholar
Grochow, J. and Kellis, M. (2007). Network motif discovery using subgraph enumeration and symmetry-breaking. Lecture Notes in Computer Science, 4453:92.
Article Google Scholar
Guignon, V., Chauve, C., and Hamel, S. (2005). An edit distance between RNA stem-loops. Lecture notes in computer science, 3772:333.
MathSciNet Google Scholar
Gupta, A. and Nishimura, N. (1998). Finding largest subtrees and smallest supertrees. Algorithmica, 21(2):183–210.
Article MATH MathSciNet Google Scholar
Hadzic, F., Dillon, T., Sidhu, A., Chang, E., and Tan, H. (2006). Mining substructures in protein data. In IEEE ICDM 2006 Workshop on Data Mining in Bioinformatics (DMB 2006), pages 18–22.
Google Scholar
Hartuv, E. and Shamir, R. (2000). A clustering algorithm based on graph connectivity. Information processing letters, 76(4–6):175–181.
Article MATH MathSciNet Google Scholar
Hashimoto, K., Aoki-Kinoshita, K., Ueda, N., Kanehisa, M., and Mamitsuka, H. (2006a). A new efficient probabilistic model for mining labeled ordered trees. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 177–186.
Google Scholar
Hashimoto, K., Goto, S., Kawano, S., Aoki-Kinoshita, K., Ueda, N., Hamajima, M., Kawasaki, T., and Kanehisa, M. (2006b). KEGG as a glycome informatics resource. Glycobiology, 16(5):63–70.
Article Google Scholar
Hashimoto, K., Takigawa, I., Shiga, M., Kanehisa, M., and Mamitsuka, H. (2008). Mining significant tree patterns in carbohydrate sugar chains. Bioinformatics, 24(16):i167.
Article Google Scholar
Henikoff, S. and Henikoff, J. (1992). Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, 89(22):10915–10919.
Article Google Scholar
Herget, S., Ranzinger, R., Maass, K., and Lieth, C. (2008). GlycoCT: a unifying sequence format for carbohydrates. Carbohydrate Research, 343(12):2162–2171.
Article Google Scholar
Hizukuri, Y., Yamanishi, Y., Nakamura, O., Yagi, F., Goto, S., and Kanehisa, M. (2005). Extraction of leukemia specific glycan motifs in humans by computational glycomics. Carbohydrate research, 340(14):2270–2278.
Article Google Scholar
Hochsmann, M., Voss, B., and Giegerich, R. (2004). Pure multiple RNA secondary structure alignments: a progressive profile approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 1(1):53–62.
Article Google Scholar
Holder, L., Cook, D., and Djoko, S. (1994). Substructure discovery in the subdue system. In Proc. of the AAAI Workshop on Knowledge Discovery in Databases, pages 169–180.
Google Scholar
Horvath, S. and Dong, J. (2008). Geometric interpretation of gene coexpression network analysis. PLoS Computational Biology, 4(8).
Google Scholar
Hu, H., Yan, X., Huang, Y., Han, J., and Zhou, X. (2005). Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(1):213–221.
Article Google Scholar
Huan, J., Wang, W., Prins, J., and Yang, J. (2004). Spin: mining maximal frequent subgraphs from graph databases. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 581–586. ACM New York, NY, USA.
Chapter Google Scholar
Huang, Y., Li, H., Hu, H., Yan, X., Waterman, M., Huang, H., and Zhou, X. (2007). Systematic discovery of functional modules and context-specific functional annotation of human genome. Bioinformatics, 23(13):i222.
Article Google Scholar
Jiang, T., Lawler, E., and Wang, L. (1994). Aligning sequences via an evolutionary tree: complexity and approximation. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pages 760–769. ACM New York, NY, USA.
Chapter Google Scholar
Jiang, T., Lin, G., Ma, B., and Zhang, K. (2002). A general edit distance between RNA structures. Journal of Computational Biology, 9(2):371–388.
Article Google Scholar
Jiang, T., Wang, L., and Zhang, K. (1995). Alignment of trees: an alternative to tree edit. Theoretical Computer Science, 143(1):137–148.
Article MATH MathSciNet Google Scholar
Jin, R., Wang, C., Polshakov, D., Parthasarathy, S., and Agrawal, G. (2005). Discovering frequent topological structures from graph datasets. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 606–611. ACM New York, NY, USA.
Chapter Google Scholar
Karypis, G. and Kumar, V. (1999). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359.
Article MATH MathSciNet Google Scholar
Kashtan, N., Itzkovitz, S., Milo, R., and Alon, U. (2004). Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics, 20(11):1746–1758.
Article Google Scholar
Kawano, S., Hashimoto, K., Miyama, T., Goto, S., and Kanehisa, M. (2005). Prediction of glycan structures from gene expression data based on glycosyltransferase reactions. Bioinformatics, 21(21):3976–3982.
Article Google Scholar
Keselman, D. and Amir, A. (1994). Maximum agreement subtree in a set of evolutionary trees-metrics and efficient algorithms. In Annual Symposium on Foundations of Computer Science, volume 35, pages 758–758. IEEE Computer Society Press.
Google Scholar
Khanna, S., Motwani, R., and Yao, F. (1995). Approximation algorithms for the largest common subtree problem.
Google Scholar
Kohonen, T. (1995). Self-organizing maps. Springer, Berlin.
Google Scholar
Koyuturk, M., Grama, A., and Szpankowski, W. (2004a). An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics, 20(90001).
Google Scholar
Koyuturk, M., Szpankowski, W., and Grama, A. (2004b). Biclustering gene-feature matrices for statistically significant dense patterns. In 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings, pages 480–484.
Google Scholar
Kuboyama, T., Hirata, K., Aoki-Kinoshita, K., Kashima, H., and Yasuda, H. (2006). A gram distribution kernel applied to glycan classification and motif extraction. Genome Informatics Series, 17(2):25.
Google Scholar
Le, S., Owens, J., Nussinov, R., Chen, J., Shapiro, B., and Maizel, J. (1989). RNA secondary structures: comparison and determination of frequently recurring substructures by consensus. Bioinformatics, 5(3):205–210.
Article Google Scholar
Lee, H., Hsu, A., Sajdak, J., Qin, J., and Pavlidis, P. (2004). Coexpression analysis of human genes across many microarray data sets. Genome Research, 14(6):1085–1094.
Article Google Scholar
Lemmens, K., Dhollander, T., De Bie, T., Monsieurs, P., Engelen, K., Smets, B., Winderickx, J., De Moor, B., and Marchal, K. (2006). Inferring transcriptional modules from ChIP-chip, motif and microarray data. Genome biology, 7(5):R37.
Article Google Scholar
Li, H., Marsolo, K., Parthasarathy, S., and Polshakov, D. (2004). A new approach to protein structure mining and alignment. Proceedings of the ACM SIGKDD Workshop on Data Mining and Bioinformatics (BIOKDD), pages 1–10.
Google Scholar
Li, X., Foo, C., and Ng, S. (2007). Discovering protein complexes in dense reliable neighborhoods of protein interaction networks. In Computational Systems Bioinformatics: Proceedings of the CSB 2007 Conference, page 157. Imperial College Press.
Google Scholar
Liu, N. and Wang, T. (2006). A method for rapid similarity analysis of RNA secondary structures. BMC bioinformatics, 7(1):493.
Article Google Scholar
Loß, A., Bunsmann, P., Bohne, A., Loß, A., Schwarzer, E., Lang, E., and Von der Lieth, C. (2002). SWEET-DB: an attempt to create annotated data collections for carbohydrates. Nucleic acids research, 30(1):405–408.
Article Google Scholar
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pages 281–297.
Google Scholar
Margush, T. and McMorris, F. (1981). Consensusn-trees. Bulletin of Mathematical Biology, 43(2):239–244.
MATH MathSciNet Google Scholar
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., and Alon, U. (2002). Network motifs: simple building blocks of complex networks. Science, 298(5594):824–827.
Article Google Scholar
Mitchell, J., Cheng, J., and Collins, K. (1999). A box H/ACA small nucleolar RNA-like domain at the human telomerase RNA 3’end. Molecular and cellular biology, 19(1):567–576.
Google Scholar
Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69:026113.
Article Google Scholar
Ohtsubo, K. and Marth, J. (2006). Glycosylation in cellular mechanisms of health and disease. Cell, 126(5):855–867.
Article Google Scholar
Onoa, B. and Tinoco, I. (2004). RNA folding and unfolding. Current Opinion in Structural Biology, 14(3):374–379.
Article Google Scholar
Packer, N., von der Lieth, C., Aoki-Kinoshita, K., Lebrilla, C., Paulson, J., Raman, R., Rudd, P., Sasisekharan, R., Taniguchi, N., and York, W. (2008). Frontiers in glycomics: Bioinformatics and biomarkers in disease. Proteomics, 8(1).
Google Scholar
Pizzuti, C. and Rombo, S. (2008). Multi-functional protein clustering in ppi networks. In Bioinformatics Research and Development, pages 318–330.
Google Scholar
Ragan, M. (1992). Phylogenetic inference based on matrix representation of trees. Molecular Phylogenetics and Evolution, 1(1):53.
Article Google Scholar
Ravasz, E., Somera, A., Mongru, D., Oltvai, Z., and Barabasi, A. (2002). Hierarchical organization of modularity in metabolic networks.
Google Scholar
Sahoo, S., Thomas, C., Sheth, A., Henson, C., and York, W. (2005). GLYDE-an expressive XML standard for the representation of glycan structure. Carbohydrate research, 340(18):2802–2807.
Article Google Scholar
Sanderson, M., Purvis, A., and Henze, C. (1998). Phylogenetic supertrees: assembling the trees of life. Trends in Ecology & Evolution, 13(3):105–109.
Article Google Scholar
Satuluri, V. and Parthasarathy, S. (2009). Scalable Graph Clustering using Stochastic Flows: Applications to Community Discovery. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 737–746.
Google Scholar
Scholkopf, B. and Smola, A. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT press.
Google Scholar
Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., and Friedman, N. (2003). Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature genetics, 34(2):166–176.
Article Google Scholar
Selkow, S. (1977). The tree-to-tree editing problem. Information processing letters, 6(6):184–186.
Article MATH MathSciNet Google Scholar
Shapiro, B. and Zhang, K. (1990). Comparing multiple RNA secondary structures using tree comparisons. Bioinformatics, 6(4):309–318.
Article Google Scholar
Sharan, R. and Shamir, R. (2000). CLICK: A clustering algorithm with applications to gene expression analysis. 8:307–316.
Google Scholar
Shasha, D., Wang, J., and Zhang, S. (2004). Unordered tree mining with applications to phylogeny. In in Proceedings of International Conference on Data Engineering, pages 708–719.
Google Scholar
Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence, 22(8):888–905.
Article Google Scholar
Shih, F. and Mitchell, O. (1989). Threshold decomposition of gray-scale morphology into binarymorphology. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(1):31–42.
Article MATH Google Scholar
Smith, T. and Waterman, M. (1981). Identification of common molecular subsequences. J. Mol. Bwl, 147:195–197.
Article Google Scholar
Sneath, S. (1973). Hierarchical clustering.
Google Scholar
Stark, C., Breitkreutz, B., Reguly, T., Boucher, L., Breitkreutz, A., and Tyers, M. (2006). BioGRID: a general repository for interaction datasets. Nucleic acids research, 34(Database Issue):D535.
Article Google Scholar
Stockham, C., Wang, L., and Warnow, T. (2002). Statistically based postprocessing of phylogenetic analysis by clustering. Bioinformatics, 18(3):465–469.
Article Google Scholar
Stuart, J., Segal, E., Koller, D., and Kim, S. (2003a). A gene-coexpression network for global discovery of conserved genetic modules. Science, 302(5643):249–255.
Article Google Scholar
Stuart, J., Segal, E., Koller, D., and Kim, S. (2003b). A gene-coexpression network for global discovery of conserved genetic modules. Science, 302(5643):249–255.
Article Google Scholar
Tai, K. (1979). The tree-to-tree correction problem. Journal of the Association for Computing Machm© ry, 26(3):422–433.
MATH MathSciNet Google Scholar
Tanay, A., Sharan, R., Kupiec, M., and Shamir, R. (2004). Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proceedings of the National Academy of Sciences, 101(9):2981–2986.
Article Google Scholar
Tanay, A., Sharan, R., and Shamir, R. (2002). Discovering statistically significant biclusters in gene expression data. Bioinformatics, 18(Suppl 1):S136–S144.
Google Scholar
Tinoco, I. and Bustamante, C. (1999). How RNA folds. Journal of molecular biology, 293(2):271–281.
Article Google Scholar
Ueda, N., Aoki, K., and Mamitsuka, H. (2004). A general probabilistic framework for mining labeled ordered trees. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 357–368.
Google Scholar
Ueda, N., Aoki-Kinoshita, K., Yamaguchi, A., Akutsu, T., and Mamitsuka, H. (2005). A probabilistic model for mining labeled ordered trees: Capturing patterns in carbohydrate sugar chains. IEEE Transactions on Knowledge and Data Engineering, 17(8):1051–1064.
Article Google Scholar
Valiente, G. (2002). Algorithms on trees and graphs. Springer.
Google Scholar
Wang, C. and Parthasarathy, S. (2004). Parallel algorithms for mining frequent structural motifs in scientific data. In Proceedings of the 18th annual international conference on Supercomputing, pages 31–40. ACM New York, NY, USA.
Chapter Google Scholar
Wang, L., Jiang, T., and Gusfield, D. (1997). A more efficient approximation scheme for tree alignment. In Proceedings of the first annual international conference on Computational molecular biology, pages 310–319. ACM New York, NY, USA.
Google Scholar
Wang, L., Jiang, T., and Lawler, E. (1996). Approximation algorithms for tree alignment with a given phylogeny. Algorithmica, 16(3):302–315.
Article MathSciNet Google Scholar
Yamanishi, Y., Bach, F., and Vert, J. (2007). Glycan classification with tree kernels. Bioinformatics, 23(10):1211.
Article Google Scholar
Yan, X., Mehan, M., Huang, Y., Waterman, M., Yu, P., and Zhou, X. (2007). A graph-based approach to systematically reconstruct human tran-scriptional regulatory modules. Bioinformatics, 23(13):i577.
Article Google Scholar
You, C. H., Holder, L. B., and Cook, D. J. (2006). Application of graph-based data mining to metabolic pathways. Data Mining Workshops, International Conference on, 0:169–173.
Article Google Scholar
Zaki, M. (2005). Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17(8):1021–1035.
Article Google Scholar
Zhang, K. and Jiang, T. (1994). Some MAX SNP-hard results concerning unordered labeled trees. Information Processing Letters, 49(5):249–254.
Article MATH MathSciNet Google Scholar
Zhang, K. and Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM journal on computing, 18:1245.
Article MATH MathSciNet Google Scholar
Zhang, S. and Wang, T. (2008). Discovering Frequent Agreement Subtrees from Phylogenetic Data. IEEE Transactions on Knowledge and Data Engineering, 20(1):68–82.
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Ohio State University, 2015 Neil Ave, DL395, Columbus, OH, USA
S. Parthasarathy, S. Tatikonda & D. Ucar

Authors

S. Parthasarathy
View author publications
You can also search for this author in PubMed Google Scholar
S. Tatikonda
View author publications
You can also search for this author in PubMed Google Scholar
D. Ucar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Parthasarathy .

Editor information

Editors and Affiliations

Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, U.S.A.
Charu C. Aggarwal
Microsoft Research Asia, Zhichun Road 49, Beijing, 100080, China, People's Republic
Haixun Wang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Parthasarathy, S., Tatikonda, S., Ucar, D. (2010). A Survey of Graph Mining Techniques for Biological Datasets. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_18

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6045-0_18
Published: 18 January 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6044-3
Online ISBN: 978-1-4419-6045-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics