Abstract
We have described a running system that embodies efficient parallel implementations of several symbolic machine learning induction operators. It functions as an “Induction Assistant” to a domain expert. First we developed an efficient, noise-tolerant, similarity-based parallel matching algorithm. This should apply to other graph-based representations of domains possessing an embedding in which the low-level features (relations or groupings) are mostly local. It was used as infrastructure to construct efficient parallel implementations of several symbolic machine learning induction operators. Finally, the induction operators were sandwiched together with sets of filters (both syntactic and empirical) to compose a crude form of induction scripts, which are invoked by a domain expert. The matching algorithm has very attractive scaling properties as the size of the problem and/or the number of processors increases. Hardware usage is efficient. The results reported in this article were obtained on an 8K CM-2 Connection Machine. The implemented system was used to discover something previously unknown to the domain expert [47].
For us, the key contribution of this work is its demonstration of the scalability of the algorithms involved: The time complexity of every algorithm reported here is nearly independent of the size of the data, provided sufficient parallel hardware is available (subject to discussion about the instance-ID and characteristic set bit-vectors).
Reprinted, with permission, from Proceedings of the 24th Annual Hawaii International Conf. on System Sciences (HICSS'24), Kauai, Hawaii, Jan 8–11, 1991, pp. 585–594.
Preview
Unable to display preview. Download preview PDF.
References
Abarbanel, R. M. (1984), Protein Structural Knowledge Engineering, Ph.D. thesis, University of California, San Francisco.
Bradley, M., T. Smith, R. Lathrop, D. Livingston, and T. Webster (1987), “Consensus Topography in the ATP Binding Site of the Simian Virus 40 and Polyomavirus Large Tumor Antigens,” Proc. Natl. Acad. Sciences USA, 84:4026–4030.
Cohen, F. E., R. M. Abarbanel, I. D. Kuntz, and R. J. Fletterick (1986), “Turn Prediction in Proteins Using a Pattern-Matching Approach,” Biochemistry, 25:266–275.
Cohen, F. E., and I. D. Kuntz (1989), “Tertiary Structure Predictions,” in Prediction of Protein Structure and the Principles of Protein Conformation, G. D. Fasman (ed.), Plenum Press, New York, pp. 647–706.
Collins, J. F., and A. F. Coulson (1984), “Applications of Parallel Processing Algorithms for DNA Sequence Analysis,” Nucl. Acids Res., 12:181–192.
Drescher, G. L. (1989), A Mechanism for Early Piagetian Learning, Ph.D. thesis, Massachusetts Institute of Technology, Cambridge.
Farmer, J. and N. Packard (1986), “The Immune System, Adaptation, and Machine Learning,” Physica, 22D:187–204.
Figge, J., T. Webster, T. Smith, and E. Paucha (1988), “Prediction of Similar Transforming Region in Simian Virus 40 Large T, Adenovirus E1A, and myc Oncoproteins,” J. Virology, 62(5):1814–1818.
Figge, J., and T. Smith (1988), “Cell-Division Sequence Motif,” Nature, 334:109.
Friedland, P., and L. Kedes (1985), “Discovering the Secrets of DNA,” Computer, 18(11):49–69.
Friedrichs, M., and P. Wolynes (1989), “Toward Protein Tertiary Structure Recognition by Means of Associative Memory Hamiltonians,” Science, 246:371–373.
Gascuel, O., and A. Danchin (1986), “Protein Export in Prokaryotes and Eukaryotes: Indications of a Difference in the Mechanism of Exportation,” J. Mol. Evol., 24:130–142.
Goldsborough, M. D., D. DiSilvestre, G. F. Temple, A. T. Lorincz (1989), “Nucleotide Sequence of Human Papilloma Virus Type 31: A Cervical Neoplasia-Associated Virus,” Virology, 171:306–311.
Hayes-Roth, B., et al. (1986), “PROTEAN: Deriving Protein Structure from Constraints,” in Proc. Fifth Natl. Conf. on Artificial Intelligence, pp. 904–909.
Hillis, W. D. (1986), The Connection Machine, MIT Press, Cambridge, MA.
Holland, J., K. Holyoak, R. Nisbett, and P. Thagard (1986), Induction: Processes of Inference, Learning, and Discovery, MIT Press, Cambridge, MA, USA.
Holley, L. H. and M. Karplus (1989), “Protein Structure Prediction With a Neural Network,” Proc. Natl. Acad. Sciences USA, 86:152–156.
Hunter, L. E. (1989), Knowledge Acquisition Planning: Gaining Expertise Through Experience, Ph.D. thesis, Yale University.
Karp, P. and P. Friedland (1989), “Coordinating the Use of Qualitative and Quantitative Knowledge in Declarative Device Modeling,” in Widman, L. E., D. H. Helman, and K. Loparo (eds.), Artificial Intelligence, Modeling and Simulation, John Wiley and Sons.
Koile, K. and C. Overton (1989), “A Qualitative Model for Gene Expression,” Proc. 1989 Summer Computer Simulation Conf., Soc. for Computer Simulation.
Kolata, G. (1986), “Trying to Crack the Second Half of the Genetic Code,” Science, 233:1037–1039.
Lander, E., J. Mesirov, and W. Taylor (1988), “Study of Protein Sequence Comparison Metrics on the Connection Machine CM-2,” Proc. Supercomputing'88.
Lathrop, R. H. (1990), Efficient Methods For Massively Parallel Symbolic Induction: Algorithms and Implementation, Ph.D. thesis, Massachusetts Institute of Technology.
Lathrop, R. H., T. A. Webster, and T. F. Smith (1987a), “ARIADNE: Pattern/Directed Inference and Hierarchical Abstraction in Protein Structure Recognition,” Comm. of the ACM, 30(11):909–921.
Maryanski, F. J., and T. L. Booth (1977), “Inference of Finite-State Probabilistic Grammars,” IEEE Trans. on Computers, C-26(6):521–536.
Michalski, R. S., J. G. Carbonell, and T. M. Mitchell (1983), (eds.) Machine Learning: An Artificial Intelligence Approach, (first in a series), Tioga Press, Palo Alto, CA.
Minsky, M. (1986), The Society of Mind, Simon and Schuster.
Mitchell, T. M. (1977), “Version Spaces: A Candidate Elimination Approach to Rule Learning,” Proc. Fifth Intl. Joint Conf. on Artificial Intelligence, Cambridge, MA, pp. 305–310.
Qian, N. and T. Sejnowski (1988), “Predicting the Secondary Structure of Globular Proteins Using Neural Network Models,” J. Mol. Biol., 202:865–884.
Quinlan, J. R., and R. L. Rivest (1989), “Inferring Decision Trees Using the Minimum Description Length Principle,” Information and Computation, March, 80(3):227–248.
Richardson, J. (1981), “The Anatomy and Taxonomy of Protein Structure,” Advances in Protein Chemistry, 34:167–339.
1986 (Rumelhart et al.), Parallel Distributed Processing, volume 1, MIT Press, Cambridge, MA.
Sankoff, D. and J. B. Kruskal (1983), (eds.) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, Reading, MA, USA.
Searls, D. B. (1988), “Representing Genetic Information with Formal Grammars,” in Proc. of the Seventh Natl. Conf. on Artificial Intelligence, pp. 386–391.
Smith, R. F. and T. F. Smith (1990), “Automatic Generation of Primary Sequence Patterns from Sets of Related Protein Sequences,” Proc. Natl. Acad. Sci. USA, 87:118–122, Jan.
Smith, T. F. and M. S. Waterman (1981), “Identification of Common Molecular Subsequences,” J. Mol. Biol., 147:195–197.
Steele, G. L. (1984), Common LISP: The Manual, Digital Press, Billerica, MA, USA.
Tambe, M., D. Kapl, A. Gupta, C. Forgy, B. Milnes, A. Newell (1988), “Soar/PSM-E: Investigating Match Parallelism in a Learning Production System,” Proc. Parallel Programming Environments Applications Languages and Systems.
Taylor, W. (1987), “Identification of Protein Sequence Homology by Consensus Template Alignment,” J. Mol. Biol., 188:233–258.
Thinking Machines Corp. (1988), Paris Reference Manual, Cambridge, MA, USA.
Valiant, L. G. (1984), “A Theory of the Learnable,” Comm. of the ACM, 27(11):1134–1142.
Vitter, S. J. and J. H. Lin (1988), “Learning in Parallel,” in Proc. 1988 Workshop on Computational Learning Theory (COLT'88), pp. 106–124, ed. D. Haussler and L. Pitt.
Waterman, M. S. (1984), “General Methods of Sequence Comparison,” Bull. of Math. Biol., 46:473–500.
Webster, T. A., R. H. Lathrop, and T. F. Smith (1987), “Prediction of a Common Structural Domain in Aminoacyl-tRNA Synthetases Through Use of a New Pattern-Directed Inference System,” Biochemistry, 26:6950–6957.
Webster, T. A., R. H. Lathrop, and T. F. Smith (1988), “Pattern Descriptors and the Unidentified Reading Frame 6 Human mtDNA Dinucleotide-Binding Site” Proteins, 3(2):97–101.
Webster, T. A., R. Patarca, R. H. Lathrop, and T. F. Smith (1989), “Potential Structural Motifs in Reverse Transcriptases,” Mol. Biol. Evol., 6(3):317–320.
Webster, T. A., R. H. Lathrop, P. H. Winston, and T. F. Smith (1990), “DNA-and RNA-directed DNA Polymerase Common Structural Motif,” (submitted).
Winston, P. H., T. O. Binford, B. Katz, and M. Lowry (1983), “Learning Physical Descriptions from Functional Descriptions, Examples, and Precedents,” in Proc. of the Natl. Conf. on Artificial Intelligence, (Washington, D. C., Aug. 22–26), William Kaufman, Los Altos, Ca., pp. 433–439.
Winston, P. H. (1984), Artificial Intelligence, 2nd ed., Addison-Wesley, Reading, MA, USA.
Winston, P. H., and Rao, S. (1990), “Repairing Learned Knowledge using Experience,” in Artificial Intelligence at MIT: Expanding Frontiers, edited by Patrick H. Winston with Sarah A. Shellard, MIT Press, Cambridge, MA, in press.
Zhang, X., D. Waltz, and J. Mesirov (1989), “Protein Structure Prediction by a Data-level Parallel Algorithm,” Proc. Supercomputing'89, Nov. 13–17, Reno, NV, USA, pp. 215–223.
Zhu, Q., T. F Smith, R. H. Lathrop, and J. Figge (1990), “The Acid Helix-Turn Activator Motif,” Proteins, 8:156–163.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lathrop, R.H., Webster, T.A., Smith, T.F., Winston, P.H. (1993). Massively parallel symbolic induction of protein structure/function relationships. In: Hanson, S.J., Remmele, W., Rivest, R.L. (eds) Machine Learning: From Theory to Applications. Lecture Notes in Computer Science, vol 661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56483-7_30
Download citation
DOI: https://doi.org/10.1007/3-540-56483-7_30
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56483-6
Online ISBN: 978-3-540-47568-2
eBook Packages: Springer Book Archive