Abstract
Many computational and statistical methods have been developed and applied in bioinformatics. Recently, new approaches based on support vector machines have been developed. Support vector machines provide a way of combining computational methods and statistical methods. After overviewing fundamental computational and statistical methods in bioinformatics, this paper surveys how these methods are used with support vector machines in order to analyze biological sequence data. This paper also overviews a method to handle chemical structures using support vector machines.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proc. Second International Conf. on Intelligent Systems for Molecular Biology, pp. 28–36 (1994)
Ben-Hur, A., Brutlag, D.: Remote homology detection: a motif based approach. Bioinformatics 33, i26-i33 (2003)
Bock, J.R., Gough, D.A.: Predicting protein-protein interactions from primary structure. Bioinformatics 17, 455–460 (2001)
Brazma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology 5, 279–305 (1998)
Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, Cambridge (2000)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: Hardness results and efficient alternatives. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 129–143. Springer, Heidelberg (2003)
Haussler, D.: Convolution kernels on discrete structures, Technical Report, UC Santa Cruz (1999)
Hayashida, M., Ueda, N., Akutsu, T.: Inferring strengths of protein-protein interactions from experimental data using linear programming. Bioinformatics 19, ii58-ii65 (2003)
Henikoff, A., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)
Hourai, Y., Akutsu, T., Akiyama, Y.: Optimizing substitution matrices by separating score distributions. Bioinformatics 20, 863–873 (2004)
Jaakola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computational Biology 7, 95–114 (2000)
Kann, M., Qian, B., Goldstein, R.A.: Optimization of a new score function for the detection of remote homologs. Proteins: Structure, Function, and Genetics 41, 498–503 (2000)
Karchin, R., Karplus, K., Haussler, D.: Classifying G-protein coupled receptors with support vector machines. Bioinformatics 18, 147–159 (2002)
Kashima, J., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proc. 20th Int. Conf. Machine Learning, pp. 321–328. AAAI Press, Menlo Park (2003)
Kondor, R.I., Lafferty, J.D.: Diffusion kernels on graphs and other discrete input spaces. In: Proc. 19th Int. Conf. Machine Learning, pp. 315–322. AAAI Press, Menlo Park (2002)
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Lee, Y., Lee, C.-K.: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19, 1132–1139 (2003)
Leslie, C., Eskin, E., Noble, W.E.: The spectrum kernel: a string kernel for svm protein classification. In: Proc. Pacific Symp. Biocomputing 2002, pp. 564–575 (2002)
Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.E.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004)
Levitt, M., Gernstein, M., Huang, E., Subbiah, S., Tsai, J.: Protein folding: The endgame. Annual Review of Biochemistry 66, 549–579 (1997)
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of Computational Biology 10, 857–868 (2003)
Mahé, P., Ueda, N., Akutsu, T., Perret, J.-L., Vert, J.-P.: Extensions of marginalized graph kernels. In: Proc. 21st Int. Conf. Machine Learning, pp. 552–559. AAAI Press, Menlo Park (2004)
Moult, J., Fidelis, K., Zemla, A., Hubbard, T.: Critical assessment of methods for protein structure prediction (CASP)-round V. Proteins: Structure, Function, and Genetics 53, 334–339 (2003)
Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press (2001)
Mukherjee, S., et al.: Estimating dataset size requirements for classifying DNA microarray data. Journal of Computational Biology 10, 119–142 (2003)
Murzin, A.G., et al.: SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540 (1995)
Notredame, C.: Recent progresses in multiple sequence alignment: A survey. Pharmacogenomics 3, 131–144 (2002)
Park, K.-J., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19, 1656–1663 (2003)
Pevzner, P.A.: Computational Molecular Biology. An Algorithmic Approach. The MIT Press, Cambridge (2000)
Saigo, H., Vert, J.-P., Ueda, N., Akutsu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20, 1682–1689 (2004)
Schölkopf, B., Tsuda, K., Vert, J.-P.: Kernel Methods in Computational Biology. MIT Press, Cambridge (2004)
Thompson, J., Higgins, D., Gibson, T., Clustal, W.: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4690 (1994)
Tsuda, K., Kin, T., Asai, K.: Marginalized kernels for biological sequences. Bioinformatics 275, S268-S275 (2002)
Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology 1, 337–348 (1994)
Ward, J.J., McGuffin, L.J., Buxton, B.F., Jones, D.T.: Secondary structure prediction with support vector machines. Bioinformatics 19, 1650–1655 (2003)
Watkins, C.: Dynamic alignment kernels. In: Advances in Large Margin Classifiers, pp. 39–50. MIT Press, Cambridge (2000)
Vert, J.-P.: A tree kernel to analyse phylogenetic profiles. Bioinformatics 284, S276-S284 (2002)
Yamanishi, Y., Vert, J.-P., Nakaya, A., Kanehisa, M.: Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis. Bioinformatics 330, i323-i330 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Akutsu, T. (2005). Computational and Statistical Methods in Bioinformatics. In: Tsumoto, S., Yamaguchi, T., Numao, M., Motoda, H. (eds) Active Mining. Lecture Notes in Computer Science(), vol 3430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11423270_2
Download citation
DOI: https://doi.org/10.1007/11423270_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26157-5
Online ISBN: 978-3-540-31933-7
eBook Packages: Computer ScienceComputer Science (R0)