Abstract
This paper discusses some problems in Molecular Biology to which learning paradigms may be applicable. As a case, we present our recent study on knowledge discovery from amino acid sequences by PAC-learning paradigm.
Preview
Unable to display preview. Download preview PDF.
References
Arikawa, S., Kuhara, S., Miyano, S., Mukouchi, Y., Shinohara, A., and Shinohara, T. [1993], Machine discovery of a negative motif from amino acid sequences by decision trees over regular patterns, New Generation Computing 11, 361–375.
Arikawa, S., Kuhara, S., Miyano, S., Shinohara, A., and Shinohara, T. [1992], A learning algorithm for elementary formal systems and its experiments on identification of transmembrane domains, Proc. 25th Hawaii International Conference on System Sciences, 675–684.
Asai, K., Hayamizu, S., and Onizuka, K. [1993], HMM with protein structure grammar, Proc. 26th Hawaii International Conference on System Sciences, 783–791.
Bairoch, A. [1991], PROSITE: a dictionary of sites and patterns in proteins, Nucleic Acids Res. 19, 2241–2245.
Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M.K. [1989], Learnability and the Vapnik-Chervonenkis dimension, JACM, 36, 929–965.
Brunak, S., Engelbrecht, J., and Knudsen, S. [1990], Neural network detects erros in the assignment of mRNA splice sites, Nucleic Acids Res. 18, 4797–4801.
Brunak, S., Engelbrecht, J., and Knudsen, S. [1991], Prediction of human mRNA donor and acceptor sites from the DNA sequence, J. Mol. Biol. 220, 49–65.
Bucher, P. [1988], The eukaryote promoter database of the Weizmann Institute of Science, EMBL Nucleiotite Sequence Data Library Release 17, Heidelberg, Germany.
Chou, P.Y. and Fasman, G.D. [1978], Prediction of the secondary structure of proteins from their amino acid sequence, Advances in Enzymology 47, 45–147.
Cohen, R.E., Abarbanel, R.A., Kuntz, I.D., and Fletterick, R.J. [1986], Turn prediction in proteins using a pattern matching approach, Biochemistry 25, 266–275.
Dowe, D.L., Oliver, J., Dix, T.I., Allison, L., and Wallace, C.S. [1993], A decision graph explanation of protein secondary structure prediction, Proc. 26th Hawaii International Conference on System Sciences, 669–678.
Emini, E.A., Hughes, J.V., Perlow, D.S., and Boger, J. [1985], Induction of hepatitis A virus-neutralizing antibody by a virus-specific peptide, J. Virol. 55, 836–839.
Endgelman, D.M., Steiz, T.A., and Goldman, A. [1986], Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins, Ann. Rev. Biophys. Chem. 15, 321–354.
Folz, R.J. and Gordon, J.I. [1987], Computer-assisted predictions of signal peptibase processing sites, Biochem. Biophys. Res. Comm. 146, 870–877.
Garnier, J., Osguthorpe, D.J., and Robon, B. [1978], Analysis of the accuracy and implication of simple methods for predicting the secondary structure of globular proteins, J. Mol. Biol. 120, 97–120.
Gelfand, M.S. [1989], Statistical analysis of mammalian pre-mRNA splicing sites, Nucleic Acids Res. 17, 6369–6382.
GenBank, Genetic Sequence Data Bank, National Institute of General Medical Science, NIH by contract to Intelligenetics, Inc., and Los Alamos Laboratory.
Gribskov, M. and Devereux, J. [1991], Sequence Analysis Primer, UWBC Biotechnical Resource Series, Macmillan Publishers Inc.
Haussler, D., Krogh, A., Mian, I.S., and Sjölander, K. [1993], Protein modeling using hidden Markov models: analysis of globins, Proc. 26th Hawaii International Conference on System Sciences, 792–802.
Harris, N.L. and Senapathy, P. [1990], Distribution and consensus of branch point signals in eukaryotic genes: a computerized statistical analysis, Nucleic Acids Res. 18, 3015–3019.
Holley, L.H. and Karplus, M. [1989], Protein secondary structure prediction with a neural network, Proc. Nal. Acad. Sci. USA 86, 152–156.
Hopp, T.P. and Woods, K.R. [1981], Prediction of protein antigenic determinants from amino acid sequences, Proc. Natl. Acad. Sci. USA 78, 3824–3828.
Iida, Y. and Sasaki, F. [1983], Recognition patterns for exon-intron junctions in higher organism as revealed by a computer search, J. Biochem. 94, 1731–1738.
Jameson, B.A. and Wolf, H. [1988], The antigenic index: a novel algorithm for predicting antigenic determinants, Comput. Appl. Biosci. 4, 181–186.
Karplus, P.A. and Schulz, G.E. [1985], Prediction of chain flexibility in proteins, Naturwissenschaften 72, 212–213.
Kneller, D.G., Choen, F.E., and Langridge, R. [1990], Improvements in protein secondary structure prediction by an enhanced neural network, J. Mol. Biol. 214, 171–182.
Kroeger, M., Wahl, R., and Rice, P. [1990], Compilation of DNA sequences of Escherichia coli (update 1990), Nucleic Acids Res. 18, 2549–2587.
Kyte, J. and Doolittle, R.F. [1982], A simple method for displaying the hydropathic character of protein, J. Mol. Biol., 157, 105–132.
Ladunga, I., Czako, F., Csabai, I., and Geszti, T. [1991], Improving signal peptide prediction accuracy by simulated neural network, Comput. Appl. Biosci. 7, 485–487.
Lewin, B. [1987], Genes: Third Edition, John Wiley & Sons, Inc.
Miyano, S., Shinohara, A., and Shinohara, T. [1993], Learning elementary formal systems and an application to discovering motifs in proteins, Technical Report RIFIS-TR-CS-37, Research Institute of Fundamental Information Science, Kyushu University, revised in April, 1993 (former version: Proc. 2nd Algorithmic Learning Theory, 139–150, 1991).
Nakata, K., Kanehisa, M., DeLisi, C. [1985], Prediction of splice junctions in mRNA sequences, Nucleic Acids Res. 13, 5327–5340.
Natarajan, B.K. [1989], On learning sets and functions, Machine Learning, 4, 67–97.
Pascarella, S. and Bossa, F. [1989], CLEAVAGE: a microcomputer program for predicting signal sequence cleavage sites, Comput. Appl. Biosci. 5, 53–54.
Qian, N. and Sejnowski, T.J. [1988], Predicting the secondary structure of globular proteins using neural network models, J. Mol. Biol. 202, 865–884.
Quinlan, J.R. [1986], Induction of decision trees, Machine Learning, 1, 81–106.
Senapathy, P., Shapiro, M.B., and Harris, N.L. [1990], Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to the genome project, Meth. Enzym. 183, 252–278.
Shimozono, S. and Miyano, S. [1992], Complexity of finding alphabet indexing, Technical Report RIFIS-TR-CS-61, Research Institute of Fundamental Information Science, Kyushu University, August, 1992.
Shimozono, S., Shinohara, A., Shinohara, T., Miyano, S., Kuhara, S., and Arikawa, S. [1993], Finding alphabet indexing for decision trees over regular patterns: an approach to bioinformatical knowledge acquisition, Proc. 26th Hawaii International Conference on System Sciences, 763–772.
Shinohara, T. [1983], Polynomial time inference of extended regular pattern languages, Proc. RIMS Symp. Software Science and Engineering (Lecture Notes in Computer Science), 147, 115–127.
Staden, R. [1990], An improved sequence handling package that runs on the Apple Macintosh, Comput. Applic. Biosciences 6, 387–393.
Staden, R. [1990], Finding protein coding regions in genomic sequences, Meth. Enzym. 183, 163–180.
Unger, R. and Moult, J. [1993], On the applicability of genetic algorithms to protein folding, Proc. 26th Hawaii International Conference on System Sciences, 715–725.
Valiant, L. [1984], A theory of the learnable, Commun. ACM, 27, 1134–1142.
von Heijne, G. [1981], On the hydrophobic nature of signal sequences, Eur. J. Biochem. 116, 419–422.
von Heijne, G. [1986], A new method for predicting signal sequences cleavage sites, Nucleic Acids Res. 14, 4683–4690.
Watson,J.D., Hopkins, N.H., Robets, J.W., Steitz, J.A., and Weiner, A.M. [1987],Molecular Biology of The Gene: Fourth Edition, The Benjamin/Cummings Publishing Company, Inc.
Yanagihara, N., Suwa, M., and Mitaku, S. [1989], A theoretical method for distinguishing between soluble and membrane proteins, Biophysical Chemistry, 34, No. 1, 69–77.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Miyano, S. (1993). Learning theory toward Genome Informatics. In: Jantke, K.P., Kobayashi, S., Tomita, E., Yokomori, T. (eds) Algorithmic Learning Theory. ALT 1993. Lecture Notes in Computer Science, vol 744. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57370-4_34
Download citation
DOI: https://doi.org/10.1007/3-540-57370-4_34
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57370-8
Online ISBN: 978-3-540-48096-9
eBook Packages: Springer Book Archive