skip to main content
article

Managing and analyzing carbohydrate data

Published: 01 June 2004 Publication History

Abstract

One of the most vital molecules in multicellular organisms is the carbohydrate, as it is structurally important in the construction of such organisms. In fact, all cells in nature carry carbohydrate sugar chains, or glycans, that help modulate various cell-cell events for the development of the organism. Unfortunately, informatics research on glycans has been slow in comparison to DNA and proteins, largely due to difficulties in the biological analysis of glycan structures. Our work consists of data engineering approaches in order to glean some understanding of the current glycan data that is publicly available. In particular, by modeling glycans as labeled unordered trees, we have implemented a tree-matching algorithm for measuring tree similarity. Our algorithm utilizes proven efficient methodologies in computer science that has been extended and developed for glycan data. Moreover, since glycans are recognized by various agents in multicellular organisms, in order to capture the patterns that might be recognized, we needed to somehow capture the dependencies that seem to range beyond the directly connected nodes in a tree. Therefore, by defining glycans as labeled ordered trees, we were able to develop a new probabilistic tree model such that sibling patterns across a tree could be mined. We provide promising results from our methodologies that could prove useful for the future of glycome informatics.

References

[1]
K. F. AOKI ET AL., Efficient tree-matching methods for accurate carbohydrate database queries, Genome Informatics, 14 (2003), pp. 134--143.
[2]
K. F. AOKI ET AL., Application of a new probabilistic model for recognizing complex patterns in glycans, in ISMB, 2004.
[3]
T. ASAI ET AL., Online algorithms for mining semi-structured data stream, in ICDM, 2002, pp. 27--34.
[4]
E. BAUM AND T. PETRIE, Statistical inference for probabilistic functions of infinite state Markov chains, Ann. Math. Stat., 37 (1966), pp. 1554--1563.
[5]
C. R. BERTOZZI AND L. L. KIESSLING, Carbohydrates and glycobiology review: Chemical glycobiology, Science, 291 (2001), pp. 2357--2364.
[6]
S. A. BROOKS ET AL., Functional and Molecular Glycobiology, BIOS Scientific Publishers Ltd., 2002.
[7]
A. DEMPSTER, N. LAIRD, AND D. RUBIN, Maximum likelihood from incomplete data via the EM algorithm, J. R. Statist. Soc. B, 39 (1977), pp. 1--38.
[8]
M. DILIGENTI ET AL., Hidden tree Markov models for document image classification, IEEE Trans. PAMI, 25 (2003), pp. 519--523.
[9]
K. DRICKAMER, Two distinct classes of carbohydrate-recognition domains in animal lectins, J. Biol. Chem., 263 (1988), pp. 9557--9560.
[10]
R. DURBIN ET AL., Biological sequence analysis, Cambridge University Press, Cambridge, 1998.
[11]
J. EDMONDS AND D. MATULA, An algorithm for subtree identification, SIAM Rev., 10 (1968), pp. 273--274.
[12]
P. FALK, L. C. HOSKINS, AND G. LARSON, Bacteria of the human intestinal microbiota produce glycosidases specific for lacto-series glycosphingolipids, J. Biochem, 108 (1990), pp. 466--474.
[13]
S. FINE, Y. SINGER, AND N. TISHBY, The hierarchical hidden Markov model: Analysis and applications, Machine Learning, 32 (1998), pp. 41--62.
[14]
C. HYEOKHO AND R. BARANIUK, Multiscale image segmentation using wavelet-domain hidden Markov models, IEEE Trans. Image Proc., 46 (2001), pp. 886--902.
[15]
M. KANEHISA ET AL., The KEGG resource for deciphering the genome, NAR, 32 (2004), pp. D277-D280.
[16]
H. KASHIMA AND T. KOYANAGI, Kernels for semi-structured data, in ICML, 2002, pp. 291--298.
[17]
G. LIN ET AL., Edit distance between two RNA structures, in RECOMB, 2001, pp. 211--220.
[18]
I. MARCHAL, G. GOLFIER, O. DUGAS, AND M. MAJED., Bioinformatics in glycobiology, Biochimie, 85 (2003), pp. 75--81.
[19]
L. RABINER AND S. JUANG, Fundamentals of Speech Recognition, Prentice Hall, NJ, USA, 1993.
[20]
T. F. SMITH AND M. S. WATERMAN, Identification of common molecular subsequences, J. Mol. Biol., 147 (1981), pp. 195--197.
[21]
K. TAI, The tree-to-tree correction problem, Journal of the ACM, 26 (1979), pp. 422--433.
[22]
N. UEDA, K. F. AOKI, AND H. MAMITSUKA, A general probabilistic framework for mining labeled ordered trees, in SIAM DM, 2004.
[23]
A. VARKI, Sialic acids as ligands in recognition phenomena, FASEB J., 11 (1997), pp. 248--255.
[24]
A. VARKI ET AL., eds., Essentials of Glycobiology, Cold Spring Harbor Lab. Press, New York, 1999.
[25]
M. ZAKI AND C. AGGARWAL, Xrules: An effective structural classifier for XML data, in KDD, 2003, pp. 316--325.

Cited By

View all
  • (2008)An efficient unordered tree kernel and its application to glycan classificationProceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining10.5555/1786574.1786595(184-195)Online publication date: 20-May-2008
  • (2008)A new efficient probabilistic model for mining labeled ordered trees applied to glycobiologyACM Transactions on Knowledge Discovery from Data10.1145/1342320.13423262:1(1-30)Online publication date: 3-Apr-2008
  • (2008)An Efficient Unordered Tree Kernel and Its Application to Glycan ClassificationAdvances in Knowledge Discovery and Data Mining10.1007/978-3-540-68125-0_18(184-195)Online publication date: 2008
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 33, Issue 2
June 2004
126 pages
ISSN:0163-5808
DOI:10.1145/1024694
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2004
Published in SIGMOD Volume 33, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2008)An efficient unordered tree kernel and its application to glycan classificationProceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining10.5555/1786574.1786595(184-195)Online publication date: 20-May-2008
  • (2008)A new efficient probabilistic model for mining labeled ordered trees applied to glycobiologyACM Transactions on Knowledge Discovery from Data10.1145/1342320.13423262:1(1-30)Online publication date: 3-Apr-2008
  • (2008)An Efficient Unordered Tree Kernel and Its Application to Glycan ClassificationAdvances in Knowledge Discovery and Data Mining10.1007/978-3-540-68125-0_18(184-195)Online publication date: 2008
  • (2007)Sugar Folding:  A Novel Structural Prediction Tool for Oligosaccharides and Polysaccharides 1Journal of Chemical Theory and Computation10.1021/ct700033y3:4(1620-1628)Online publication date: 14-Jun-2007
  • (2006)A new efficient probabilistic model for mining labeled ordered treesProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1150402.1150425(177-186)Online publication date: 20-Aug-2006
  • (2005)A Probabilistic Model for Mining Labeled Ordered TreesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2005.11717:8(1051-1064)Online publication date: 1-Aug-2005

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media