Skip to main content

Massively parallel symbolic induction of protein structure/function relationships

  • Chapter
  • First Online:
Machine Learning: From Theory to Applications

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 661))

Abstract

We have described a running system that embodies efficient parallel implementations of several symbolic machine learning induction operators. It functions as an “Induction Assistant” to a domain expert. First we developed an efficient, noise-tolerant, similarity-based parallel matching algorithm. This should apply to other graph-based representations of domains possessing an embedding in which the low-level features (relations or groupings) are mostly local. It was used as infrastructure to construct efficient parallel implementations of several symbolic machine learning induction operators. Finally, the induction operators were sandwiched together with sets of filters (both syntactic and empirical) to compose a crude form of induction scripts, which are invoked by a domain expert. The matching algorithm has very attractive scaling properties as the size of the problem and/or the number of processors increases. Hardware usage is efficient. The results reported in this article were obtained on an 8K CM-2 Connection Machine. The implemented system was used to discover something previously unknown to the domain expert [47].

For us, the key contribution of this work is its demonstration of the scalability of the algorithms involved: The time complexity of every algorithm reported here is nearly independent of the size of the data, provided sufficient parallel hardware is available (subject to discussion about the instance-ID and characteristic set bit-vectors).

Reprinted, with permission, from Proceedings of the 24th Annual Hawaii International Conf. on System Sciences (HICSS'24), Kauai, Hawaii, Jan 8–11, 1991, pp. 585–594.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abarbanel, R. M. (1984), Protein Structural Knowledge Engineering, Ph.D. thesis, University of California, San Francisco.

    Google Scholar 

  2. Bradley, M., T. Smith, R. Lathrop, D. Livingston, and T. Webster (1987), “Consensus Topography in the ATP Binding Site of the Simian Virus 40 and Polyomavirus Large Tumor Antigens,” Proc. Natl. Acad. Sciences USA, 84:4026–4030.

    Google Scholar 

  3. Cohen, F. E., R. M. Abarbanel, I. D. Kuntz, and R. J. Fletterick (1986), “Turn Prediction in Proteins Using a Pattern-Matching Approach,” Biochemistry, 25:266–275.

    Google Scholar 

  4. Cohen, F. E., and I. D. Kuntz (1989), “Tertiary Structure Predictions,” in Prediction of Protein Structure and the Principles of Protein Conformation, G. D. Fasman (ed.), Plenum Press, New York, pp. 647–706.

    Google Scholar 

  5. Collins, J. F., and A. F. Coulson (1984), “Applications of Parallel Processing Algorithms for DNA Sequence Analysis,” Nucl. Acids Res., 12:181–192.

    Google Scholar 

  6. Drescher, G. L. (1989), A Mechanism for Early Piagetian Learning, Ph.D. thesis, Massachusetts Institute of Technology, Cambridge.

    Google Scholar 

  7. Farmer, J. and N. Packard (1986), “The Immune System, Adaptation, and Machine Learning,” Physica, 22D:187–204.

    Google Scholar 

  8. Figge, J., T. Webster, T. Smith, and E. Paucha (1988), “Prediction of Similar Transforming Region in Simian Virus 40 Large T, Adenovirus E1A, and myc Oncoproteins,” J. Virology, 62(5):1814–1818.

    Google Scholar 

  9. Figge, J., and T. Smith (1988), “Cell-Division Sequence Motif,” Nature, 334:109.

    Google Scholar 

  10. Friedland, P., and L. Kedes (1985), “Discovering the Secrets of DNA,” Computer, 18(11):49–69.

    Google Scholar 

  11. Friedrichs, M., and P. Wolynes (1989), “Toward Protein Tertiary Structure Recognition by Means of Associative Memory Hamiltonians,” Science, 246:371–373.

    Google Scholar 

  12. Gascuel, O., and A. Danchin (1986), “Protein Export in Prokaryotes and Eukaryotes: Indications of a Difference in the Mechanism of Exportation,” J. Mol. Evol., 24:130–142.

    Google Scholar 

  13. Goldsborough, M. D., D. DiSilvestre, G. F. Temple, A. T. Lorincz (1989), “Nucleotide Sequence of Human Papilloma Virus Type 31: A Cervical Neoplasia-Associated Virus,” Virology, 171:306–311.

    Google Scholar 

  14. Hayes-Roth, B., et al. (1986), “PROTEAN: Deriving Protein Structure from Constraints,” in Proc. Fifth Natl. Conf. on Artificial Intelligence, pp. 904–909.

    Google Scholar 

  15. Hillis, W. D. (1986), The Connection Machine, MIT Press, Cambridge, MA.

    Google Scholar 

  16. Holland, J., K. Holyoak, R. Nisbett, and P. Thagard (1986), Induction: Processes of Inference, Learning, and Discovery, MIT Press, Cambridge, MA, USA.

    Google Scholar 

  17. Holley, L. H. and M. Karplus (1989), “Protein Structure Prediction With a Neural Network,” Proc. Natl. Acad. Sciences USA, 86:152–156.

    Google Scholar 

  18. Hunter, L. E. (1989), Knowledge Acquisition Planning: Gaining Expertise Through Experience, Ph.D. thesis, Yale University.

    Google Scholar 

  19. Karp, P. and P. Friedland (1989), “Coordinating the Use of Qualitative and Quantitative Knowledge in Declarative Device Modeling,” in Widman, L. E., D. H. Helman, and K. Loparo (eds.), Artificial Intelligence, Modeling and Simulation, John Wiley and Sons.

    Google Scholar 

  20. Koile, K. and C. Overton (1989), “A Qualitative Model for Gene Expression,” Proc. 1989 Summer Computer Simulation Conf., Soc. for Computer Simulation.

    Google Scholar 

  21. Kolata, G. (1986), “Trying to Crack the Second Half of the Genetic Code,” Science, 233:1037–1039.

    Google Scholar 

  22. Lander, E., J. Mesirov, and W. Taylor (1988), “Study of Protein Sequence Comparison Metrics on the Connection Machine CM-2,” Proc. Supercomputing'88.

    Google Scholar 

  23. Lathrop, R. H. (1990), Efficient Methods For Massively Parallel Symbolic Induction: Algorithms and Implementation, Ph.D. thesis, Massachusetts Institute of Technology.

    Google Scholar 

  24. Lathrop, R. H., T. A. Webster, and T. F. Smith (1987a), “ARIADNE: Pattern/Directed Inference and Hierarchical Abstraction in Protein Structure Recognition,” Comm. of the ACM, 30(11):909–921.

    Google Scholar 

  25. Maryanski, F. J., and T. L. Booth (1977), “Inference of Finite-State Probabilistic Grammars,” IEEE Trans. on Computers, C-26(6):521–536.

    Google Scholar 

  26. Michalski, R. S., J. G. Carbonell, and T. M. Mitchell (1983), (eds.) Machine Learning: An Artificial Intelligence Approach, (first in a series), Tioga Press, Palo Alto, CA.

    Google Scholar 

  27. Minsky, M. (1986), The Society of Mind, Simon and Schuster.

    Google Scholar 

  28. Mitchell, T. M. (1977), “Version Spaces: A Candidate Elimination Approach to Rule Learning,” Proc. Fifth Intl. Joint Conf. on Artificial Intelligence, Cambridge, MA, pp. 305–310.

    Google Scholar 

  29. Qian, N. and T. Sejnowski (1988), “Predicting the Secondary Structure of Globular Proteins Using Neural Network Models,” J. Mol. Biol., 202:865–884.

    Google Scholar 

  30. Quinlan, J. R., and R. L. Rivest (1989), “Inferring Decision Trees Using the Minimum Description Length Principle,” Information and Computation, March, 80(3):227–248.

    Google Scholar 

  31. Richardson, J. (1981), “The Anatomy and Taxonomy of Protein Structure,” Advances in Protein Chemistry, 34:167–339.

    Google Scholar 

  32. 1986 (Rumelhart et al.), Parallel Distributed Processing, volume 1, MIT Press, Cambridge, MA.

    Google Scholar 

  33. Sankoff, D. and J. B. Kruskal (1983), (eds.) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, Reading, MA, USA.

    Google Scholar 

  34. Searls, D. B. (1988), “Representing Genetic Information with Formal Grammars,” in Proc. of the Seventh Natl. Conf. on Artificial Intelligence, pp. 386–391.

    Google Scholar 

  35. Smith, R. F. and T. F. Smith (1990), “Automatic Generation of Primary Sequence Patterns from Sets of Related Protein Sequences,” Proc. Natl. Acad. Sci. USA, 87:118–122, Jan.

    Google Scholar 

  36. Smith, T. F. and M. S. Waterman (1981), “Identification of Common Molecular Subsequences,” J. Mol. Biol., 147:195–197.

    Google Scholar 

  37. Steele, G. L. (1984), Common LISP: The Manual, Digital Press, Billerica, MA, USA.

    Google Scholar 

  38. Tambe, M., D. Kapl, A. Gupta, C. Forgy, B. Milnes, A. Newell (1988), “Soar/PSM-E: Investigating Match Parallelism in a Learning Production System,” Proc. Parallel Programming Environments Applications Languages and Systems.

    Google Scholar 

  39. Taylor, W. (1987), “Identification of Protein Sequence Homology by Consensus Template Alignment,” J. Mol. Biol., 188:233–258.

    Google Scholar 

  40. Thinking Machines Corp. (1988), Paris Reference Manual, Cambridge, MA, USA.

    Google Scholar 

  41. Valiant, L. G. (1984), “A Theory of the Learnable,” Comm. of the ACM, 27(11):1134–1142.

    Google Scholar 

  42. Vitter, S. J. and J. H. Lin (1988), “Learning in Parallel,” in Proc. 1988 Workshop on Computational Learning Theory (COLT'88), pp. 106–124, ed. D. Haussler and L. Pitt.

    Google Scholar 

  43. Waterman, M. S. (1984), “General Methods of Sequence Comparison,” Bull. of Math. Biol., 46:473–500.

    Google Scholar 

  44. Webster, T. A., R. H. Lathrop, and T. F. Smith (1987), “Prediction of a Common Structural Domain in Aminoacyl-tRNA Synthetases Through Use of a New Pattern-Directed Inference System,” Biochemistry, 26:6950–6957.

    Google Scholar 

  45. Webster, T. A., R. H. Lathrop, and T. F. Smith (1988), “Pattern Descriptors and the Unidentified Reading Frame 6 Human mtDNA Dinucleotide-Binding Site” Proteins, 3(2):97–101.

    Google Scholar 

  46. Webster, T. A., R. Patarca, R. H. Lathrop, and T. F. Smith (1989), “Potential Structural Motifs in Reverse Transcriptases,” Mol. Biol. Evol., 6(3):317–320.

    Google Scholar 

  47. Webster, T. A., R. H. Lathrop, P. H. Winston, and T. F. Smith (1990), “DNA-and RNA-directed DNA Polymerase Common Structural Motif,” (submitted).

    Google Scholar 

  48. Winston, P. H., T. O. Binford, B. Katz, and M. Lowry (1983), “Learning Physical Descriptions from Functional Descriptions, Examples, and Precedents,” in Proc. of the Natl. Conf. on Artificial Intelligence, (Washington, D. C., Aug. 22–26), William Kaufman, Los Altos, Ca., pp. 433–439.

    Google Scholar 

  49. Winston, P. H. (1984), Artificial Intelligence, 2nd ed., Addison-Wesley, Reading, MA, USA.

    Google Scholar 

  50. Winston, P. H., and Rao, S. (1990), “Repairing Learned Knowledge using Experience,” in Artificial Intelligence at MIT: Expanding Frontiers, edited by Patrick H. Winston with Sarah A. Shellard, MIT Press, Cambridge, MA, in press.

    Google Scholar 

  51. Zhang, X., D. Waltz, and J. Mesirov (1989), “Protein Structure Prediction by a Data-level Parallel Algorithm,” Proc. Supercomputing'89, Nov. 13–17, Reno, NV, USA, pp. 215–223.

    Google Scholar 

  52. Zhu, Q., T. F Smith, R. H. Lathrop, and J. Figge (1990), “The Acid Helix-Turn Activator Motif,” Proteins, 8:156–163.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Stephen José Hanson Werner Remmele Ronald L. Rivest

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Lathrop, R.H., Webster, T.A., Smith, T.F., Winston, P.H. (1993). Massively parallel symbolic induction of protein structure/function relationships. In: Hanson, S.J., Remmele, W., Rivest, R.L. (eds) Machine Learning: From Theory to Applications. Lecture Notes in Computer Science, vol 661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56483-7_30

Download citation

  • DOI: https://doi.org/10.1007/3-540-56483-7_30

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-56483-6

  • Online ISBN: 978-3-540-47568-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics