Tree PCA for Extracting Dominant Substructures from Labeled Rooted Trees

Yamazaki, Tomoya; Yamamoto, Akihiro; Kuboyama, Tetsuji

doi:10.1007/978-3-319-24282-8_27

Tomoya Yamazaki¹⁵,
Akihiro Yamamoto¹⁵ &
Tetsuji Kuboyama¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9356))

Included in the following conference series:

International Conference on Discovery Science

1036 Accesses

Abstract

We propose novel principal component analysis (PCA) for rooted labeled trees to discover dominant substructures from a collection of trees. The principal components of trees are defined in analogy to the ordinal principal component analysis on numerical vectors. Our methods substantially extend earlier work, in which the input data are restricted to binary trees or rooted unlabeled trees with unique vertex indexing, and the principal components are also restricted to the form of paths. In contrast, our extension allows the input data to accept general rooted labeled trees, and the principal components to have more expressive forms of subtrees instead of paths. For this extension, we can employ the technique of flexible tree matching; various mappings used in tree edit distance algorithms. We design an efficient algorithm using top-down mappings based on our framework, and show the applicability of our algorithm by applying it to extract dominant patterns from a set of glycan structures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alfaro, C.A., Aydin, B., Valencia, C.E., Bullitt, E., Ladha, A.: Dimension reduction in principal component analysis for trees. CSDA 74, 157–179 (2014)
MathSciNet Google Scholar
Aydin, B., Pataki, C., Wang, H., Bullitt, E., Marron, J.S.: A principal component analysis for trees. Ann. Appl. Stat. 3(4), 1597–1615 (2009)
Article MathSciNet MATH Google Scholar
Chartrand, G., Lesniak, L.: Graphs and Digraphs, 3rd edn. Chapman and Hall/CRC, London (2000)
MATH Google Scholar
Doubet, S., Albersheim, P.: CarbBank. Glycobiology 2(6), 505–507 (1992)
Article Google Scholar
Hashimoto, K., Goto, S., Kawano, S., Aoki-Kinoshita, K.F., Ueda, N.: KEGG as a glycan informatics resource. Glycobiology 16, 63–70 (2006)
Article Google Scholar
Kuboyama, T., Hirata, K., Aoki-Kinoshita, K.F., Kashima, H., Yasuda, H.: A gram distribution kernel applied to glycan classification and motif extraction. Genome Inform. 17(2), 25–34 (2006)
Google Scholar
Kuboyama, T.: Matching and learning in trees, Ph.D. thesis, Univ. Tokyo (2007)
Google Scholar
Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. 2(6), 559–572 (1901)
Article MATH Google Scholar
Tai, K.C.: The tree-to-tree correction problem. J. Addociation Comput. Mach. 26(3), 422–433 (1979)
Article MathSciNet MATH Google Scholar
Valiente, G.: An efficient bottom-up distance between trees. In: Proceedings of the 8th SPIRE, pp. 212–219. IEEE Comp. Science Press (2001)
Google Scholar
Wang, H., Marron, J.S.: Object oriented data analysis: set of trees. Ann. Stat. 35(5), 1849–1873 (2007)
Article MathSciNet MATH Google Scholar
Wang, J.T.-L., Zhang, K.: Finding similar consensus between trees : an algorithm and a distance hierarchy. Pattern Recogn. 34, 127–137 (2001)
Article MATH Google Scholar
Yamanishi, Y., Bach, F., Vert, J.P.: Glycan classification with tree kernels. Bioinformatics 23(10), 1211–1216 (2007)
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank both the anonymous reviewers and Kouichi Hirata, Kyushu Institute of Technology, Japan for their valuable comments. This work was partially supported by the Grant-in-Aid for Scientific Research (KAKENHI Grant Numbers 26280085, 26280090, and 24300060) from the Japan Society for the Promotion of Science.

Author information

Authors and Affiliations

Graduate School of Informatics, Kyoto University Yoshida-Honmachi, Sakyo-ku, Kyoto, 606-8501, Japan
Tomoya Yamazaki & Akihiro Yamamoto
Computer Centre, Gakushuin University, 1-5-1 Mejiro, Toshima-ku, Tokyo, 171-8588, Japan
Tetsuji Kuboyama

Authors

Tomoya Yamazaki
View author publications
You can also search for this author in PubMed Google Scholar
Akihiro Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuji Kuboyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomoya Yamazaki .

Editor information

Editors and Affiliations

University of Ottawa, Ottawa, Ontario, Canada
Nathalie Japkowicz
Faculty of Computer Science, Dalhousie University, HALIFAX, Nova Scotia, Canada
Stan Matwin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yamazaki, T., Yamamoto, A., Kuboyama, T. (2015). Tree PCA for Extracting Dominant Substructures from Labeled Rooted Trees. In: Japkowicz, N., Matwin, S. (eds) Discovery Science. DS 2015. Lecture Notes in Computer Science(), vol 9356. Springer, Cham. https://doi.org/10.1007/978-3-319-24282-8_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-24282-8_27
Published: 25 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24281-1
Online ISBN: 978-3-319-24282-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics