Computational and Statistical Methods in Bioinformatics

Akutsu, Tatsuya

doi:10.1007/11423270_2

Computational and Statistical Methods in Bioinformatics

Tatsuya Akutsu²²

Conference paper

784 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3430))

Abstract

Many computational and statistical methods have been developed and applied in bioinformatics. Recently, new approaches based on support vector machines have been developed. Support vector machines provide a way of combining computational methods and statistical methods. After overviewing fundamental computational and statistical methods in bioinformatics, this paper surveys how these methods are used with support vector machines in order to analyze biological sequence data. This paper also overviews a method to handle chemical structures using support vector machines.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proc. Second International Conf. on Intelligent Systems for Molecular Biology, pp. 28–36 (1994)
Google Scholar
Ben-Hur, A., Brutlag, D.: Remote homology detection: a motif based approach. Bioinformatics 33, i26-i33 (2003)
Article Google Scholar
Bock, J.R., Gough, D.A.: Predicting protein-protein interactions from primary structure. Bioinformatics 17, 455–460 (2001)
Article Google Scholar
Brazma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology 5, 279–305 (1998)
Article Google Scholar
Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)
MATH Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, Cambridge (2000)
Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Book MATH Google Scholar
Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: Hardness results and efficient alternatives. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 129–143. Springer, Heidelberg (2003)
Chapter Google Scholar
Haussler, D.: Convolution kernels on discrete structures, Technical Report, UC Santa Cruz (1999)
Google Scholar
Hayashida, M., Ueda, N., Akutsu, T.: Inferring strengths of protein-protein interactions from experimental data using linear programming. Bioinformatics 19, ii58-ii65 (2003)
Article Google Scholar
Henikoff, A., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)
Article Google Scholar
Hourai, Y., Akutsu, T., Akiyama, Y.: Optimizing substitution matrices by separating score distributions. Bioinformatics 20, 863–873 (2004)
Article Google Scholar
Jaakola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computational Biology 7, 95–114 (2000)
Article Google Scholar
Kann, M., Qian, B., Goldstein, R.A.: Optimization of a new score function for the detection of remote homologs. Proteins: Structure, Function, and Genetics 41, 498–503 (2000)
Article Google Scholar
Karchin, R., Karplus, K., Haussler, D.: Classifying G-protein coupled receptors with support vector machines. Bioinformatics 18, 147–159 (2002)
Article Google Scholar
Kashima, J., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proc. 20th Int. Conf. Machine Learning, pp. 321–328. AAAI Press, Menlo Park (2003)
Google Scholar
Kondor, R.I., Lafferty, J.D.: Diffusion kernels on graphs and other discrete input spaces. In: Proc. 19th Int. Conf. Machine Learning, pp. 315–322. AAAI Press, Menlo Park (2002)
Google Scholar
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Article Google Scholar
Lee, Y., Lee, C.-K.: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19, 1132–1139 (2003)
Article Google Scholar
Leslie, C., Eskin, E., Noble, W.E.: The spectrum kernel: a string kernel for svm protein classification. In: Proc. Pacific Symp. Biocomputing 2002, pp. 564–575 (2002)
Google Scholar
Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.E.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004)
Article Google Scholar
Levitt, M., Gernstein, M., Huang, E., Subbiah, S., Tsai, J.: Protein folding: The endgame. Annual Review of Biochemistry 66, 549–579 (1997)
Article Google Scholar
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of Computational Biology 10, 857–868 (2003)
Article Google Scholar
Mahé, P., Ueda, N., Akutsu, T., Perret, J.-L., Vert, J.-P.: Extensions of marginalized graph kernels. In: Proc. 21st Int. Conf. Machine Learning, pp. 552–559. AAAI Press, Menlo Park (2004)
Google Scholar
Moult, J., Fidelis, K., Zemla, A., Hubbard, T.: Critical assessment of methods for protein structure prediction (CASP)-round V. Proteins: Structure, Function, and Genetics 53, 334–339 (2003)
Article Google Scholar
Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press (2001)
Google Scholar
Mukherjee, S., et al.: Estimating dataset size requirements for classifying DNA microarray data. Journal of Computational Biology 10, 119–142 (2003)
Article Google Scholar
Murzin, A.G., et al.: SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540 (1995)
Google Scholar
Notredame, C.: Recent progresses in multiple sequence alignment: A survey. Pharmacogenomics 3, 131–144 (2002)
Article Google Scholar
Park, K.-J., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19, 1656–1663 (2003)
Article Google Scholar
Pevzner, P.A.: Computational Molecular Biology. An Algorithmic Approach. The MIT Press, Cambridge (2000)
MATH Google Scholar
Saigo, H., Vert, J.-P., Ueda, N., Akutsu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20, 1682–1689 (2004)
Article Google Scholar
Schölkopf, B., Tsuda, K., Vert, J.-P.: Kernel Methods in Computational Biology. MIT Press, Cambridge (2004)
Google Scholar
Thompson, J., Higgins, D., Gibson, T., Clustal, W.: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4690 (1994)
Article Google Scholar
Tsuda, K., Kin, T., Asai, K.: Marginalized kernels for biological sequences. Bioinformatics 275, S268-S275 (2002)
Google Scholar
Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology 1, 337–348 (1994)
Article Google Scholar
Ward, J.J., McGuffin, L.J., Buxton, B.F., Jones, D.T.: Secondary structure prediction with support vector machines. Bioinformatics 19, 1650–1655 (2003)
Article Google Scholar
Watkins, C.: Dynamic alignment kernels. In: Advances in Large Margin Classifiers, pp. 39–50. MIT Press, Cambridge (2000)
Google Scholar
Vert, J.-P.: A tree kernel to analyse phylogenetic profiles. Bioinformatics 284, S276-S284 (2002)
Google Scholar
Yamanishi, Y., Vert, J.-P., Nakaya, A., Kanehisa, M.: Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis. Bioinformatics 330, i323-i330 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji-city, Kyoto, 611-0011, Japan
Tatsuya Akutsu

Authors

Tatsuya Akutsu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Shimane University, 89-1 Enya-cho Izumo, 6938501, Shimane, Japan
Shusaku Tsumoto
Faculty of Science and Technology, Keio University, 3-14-1 Hiyoshi Kohoku-ku, 223-8522, Yokohama, Japan
Takahira Yamaguchi
The Institute of Scientific and Industrial Research, Osaka University, Japan
Masayuki Numao
Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, 567-0047, Osaka, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Akutsu, T. (2005). Computational and Statistical Methods in Bioinformatics. In: Tsumoto, S., Yamaguchi, T., Numao, M., Motoda, H. (eds) Active Mining. Lecture Notes in Computer Science(), vol 3430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11423270_2

Download citation

DOI: https://doi.org/10.1007/11423270_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26157-5
Online ISBN: 978-3-540-31933-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics