article

Managing and analyzing carbohydrate data

Authors:

Kiyoko F. Aoki,

Atsuko Yamaguchi,

Tatsuya Akutsu,

Minoru Kanehisa,

Hiroshi MamitsukaAuthors Info & Claims

ACM SIGMOD Record, Volume 33, Issue 2

Pages 33 - 38

https://doi.org/10.1145/1024694.1024700

Published: 01 June 2004 Publication History

Abstract

One of the most vital molecules in multicellular organisms is the carbohydrate, as it is structurally important in the construction of such organisms. In fact, all cells in nature carry carbohydrate sugar chains, or glycans, that help modulate various cell-cell events for the development of the organism. Unfortunately, informatics research on glycans has been slow in comparison to DNA and proteins, largely due to difficulties in the biological analysis of glycan structures. Our work consists of data engineering approaches in order to glean some understanding of the current glycan data that is publicly available. In particular, by modeling glycans as labeled unordered trees, we have implemented a tree-matching algorithm for measuring tree similarity. Our algorithm utilizes proven efficient methodologies in computer science that has been extended and developed for glycan data. Moreover, since glycans are recognized by various agents in multicellular organisms, in order to capture the patterns that might be recognized, we needed to somehow capture the dependencies that seem to range beyond the directly connected nodes in a tree. Therefore, by defining glycans as labeled ordered trees, we were able to develop a new probabilistic tree model such that sibling patterns across a tree could be mined. We provide promising results from our methodologies that could prove useful for the future of glycome informatics.

References

[1]

K. F. AOKI ET AL., Efficient tree-matching methods for accurate carbohydrate database queries, Genome Informatics, 14 (2003), pp. 134--143.

[2]

K. F. AOKI ET AL., Application of a new probabilistic model for recognizing complex patterns in glycans, in ISMB, 2004.

[3]

T. ASAI ET AL., Online algorithms for mining semi-structured data stream, in ICDM, 2002, pp. 27--34.

Digital Library

[4]

E. BAUM AND T. PETRIE, Statistical inference for probabilistic functions of infinite state Markov chains, Ann. Math. Stat., 37 (1966), pp. 1554--1563.

[5]

C. R. BERTOZZI AND L. L. KIESSLING, Carbohydrates and glycobiology review: Chemical glycobiology, Science, 291 (2001), pp. 2357--2364.

[6]

S. A. BROOKS ET AL., Functional and Molecular Glycobiology, BIOS Scientific Publishers Ltd., 2002.

[7]

A. DEMPSTER, N. LAIRD, AND D. RUBIN, Maximum likelihood from incomplete data via the EM algorithm, J. R. Statist. Soc. B, 39 (1977), pp. 1--38.

[8]

M. DILIGENTI ET AL., Hidden tree Markov models for document image classification, IEEE Trans. PAMI, 25 (2003), pp. 519--523.

Digital Library

[9]

K. DRICKAMER, Two distinct classes of carbohydrate-recognition domains in animal lectins, J. Biol. Chem., 263 (1988), pp. 9557--9560.

[10]

R. DURBIN ET AL., Biological sequence analysis, Cambridge University Press, Cambridge, 1998.

[11]

J. EDMONDS AND D. MATULA, An algorithm for subtree identification, SIAM Rev., 10 (1968), pp. 273--274.

[12]

P. FALK, L. C. HOSKINS, AND G. LARSON, Bacteria of the human intestinal microbiota produce glycosidases specific for lacto-series glycosphingolipids, J. Biochem, 108 (1990), pp. 466--474.

[13]

S. FINE, Y. SINGER, AND N. TISHBY, The hierarchical hidden Markov model: Analysis and applications, Machine Learning, 32 (1998), pp. 41--62.

Digital Library

[14]

C. HYEOKHO AND R. BARANIUK, Multiscale image segmentation using wavelet-domain hidden Markov models, IEEE Trans. Image Proc., 46 (2001), pp. 886--902.

[15]

M. KANEHISA ET AL., The KEGG resource for deciphering the genome, NAR, 32 (2004), pp. D277-D280.

[16]

H. KASHIMA AND T. KOYANAGI, Kernels for semi-structured data, in ICML, 2002, pp. 291--298.

Digital Library

[17]

G. LIN ET AL., Edit distance between two RNA structures, in RECOMB, 2001, pp. 211--220.

Digital Library

[18]

I. MARCHAL, G. GOLFIER, O. DUGAS, AND M. MAJED., Bioinformatics in glycobiology, Biochimie, 85 (2003), pp. 75--81.

[19]

L. RABINER AND S. JUANG, Fundamentals of Speech Recognition, Prentice Hall, NJ, USA, 1993.

Digital Library

[20]

T. F. SMITH AND M. S. WATERMAN, Identification of common molecular subsequences, J. Mol. Biol., 147 (1981), pp. 195--197.

[21]

K. TAI, The tree-to-tree correction problem, Journal of the ACM, 26 (1979), pp. 422--433.

Digital Library

[22]

N. UEDA, K. F. AOKI, AND H. MAMITSUKA, A general probabilistic framework for mining labeled ordered trees, in SIAM DM, 2004.

[23]

A. VARKI, Sialic acids as ligands in recognition phenomena, FASEB J., 11 (1997), pp. 248--255.

[24]

A. VARKI ET AL., eds., Essentials of Glycobiology, Cold Spring Harbor Lab. Press, New York, 1999.

[25]

M. ZAKI AND C. AGGARWAL, Xrules: An effective structural classifier for XML data, in KDD, 2003, pp. 316--325.

Digital Library

Cited By

Kuboyama THirata KAoki-Kinoshita K(2008)An efficient unordered tree kernel and its application to glycan classificationProceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining10.5555/1786574.1786595(184-195)Online publication date: 20-May-2008
https://dl.acm.org/doi/10.5555/1786574.1786595
Hashimoto KAoki-Kinoshita KUeda NKanehisa MMamitsuka H(2008)A new efficient probabilistic model for mining labeled ordered trees applied to glycobiologyACM Transactions on Knowledge Discovery from Data10.1145/1342320.13423262:1(1-30)Online publication date: 3-Apr-2008
https://dl.acm.org/doi/10.1145/1342320.1342326
Kuboyama THirata KAoki-Kinoshita K(2008)An Efficient Unordered Tree Kernel and Its Application to Glycan ClassificationAdvances in Knowledge Discovery and Data Mining10.1007/978-3-540-68125-0_18(184-195)Online publication date: 2008
https://doi.org/10.1007/978-3-540-68125-0_18
Show More Cited By

Recommendations

Analysis of a Family 6 Carbohydrate-Binding Module and Three Family 69 Hyaluronidases
MatrixDB, a database focused on extracellular protein–protein and protein–carbohydrate interactions

Summary: MatrixDB ( http://matrixdb.ibcp.fr) is a database reporting mammalian protein–protein and protein–carbohydrate interactions involving extracellular molecules. It takes into account the full interaction repertoire of the extracellular matrix ...
Analysis of a Family 6 Carbohydrate-Binding Module and Three Family 69 Hyaluronidases

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record

ACM SIGMOD Record Volume 33, Issue 2

June 2004

126 pages

ISSN:0163-5808

DOI:10.1145/1024694

Issue’s Table of Contents

Copyright © 2004 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2004

Published in SIGMOD Volume 33, Issue 2

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
260
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kuboyama THirata KAoki-Kinoshita K(2008)An efficient unordered tree kernel and its application to glycan classificationProceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining10.5555/1786574.1786595(184-195)Online publication date: 20-May-2008
https://dl.acm.org/doi/10.5555/1786574.1786595
Hashimoto KAoki-Kinoshita KUeda NKanehisa MMamitsuka H(2008)A new efficient probabilistic model for mining labeled ordered trees applied to glycobiologyACM Transactions on Knowledge Discovery from Data10.1145/1342320.13423262:1(1-30)Online publication date: 3-Apr-2008
https://dl.acm.org/doi/10.1145/1342320.1342326
Kuboyama THirata KAoki-Kinoshita K(2008)An Efficient Unordered Tree Kernel and Its Application to Glycan ClassificationAdvances in Knowledge Discovery and Data Mining10.1007/978-3-540-68125-0_18(184-195)Online publication date: 2008
https://doi.org/10.1007/978-3-540-68125-0_18
Xia JDaly RChuang FParker LJensen JMargulis C(2007)Sugar Folding: A Novel Structural Prediction Tool for Oligosaccharides and Polysaccharides 1Journal of Chemical Theory and Computation10.1021/ct700033y3:4(1620-1628)Online publication date: 14-Jun-2007
https://doi.org/10.1021/ct700033y
Hashimoto KAoki-Kinoshita KUeda NKanehisa MMamitsuka HEliassi-Rad TUngar LCraven MGunopulos D(2006)A new efficient probabilistic model for mining labeled ordered treesProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1150402.1150425(177-186)Online publication date: 20-Aug-2006
https://dl.acm.org/doi/10.1145/1150402.1150425
Ueda NAoki-Kinoshita KYamaguchi AAkutsu TMamitsuka H(2005)A Probabilistic Model for Mining Labeled Ordered TreesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2005.11717:8(1051-1064)Online publication date: 1-Aug-2005
https://dl.acm.org/doi/10.1109/TKDE.2005.117

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents