skip to main content
10.1145/800046.801638acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free Access

Synchronizing large VLSI processor arrays

Published:13 June 1983Publication History

ABSTRACT

Highly parallel VLSI computing structures consist of many processing elements operating simultaneously. In order for such processing elements to communicate among themselves, some provision must be made for synchronization of data transfer. The simplest means of synchronization is the use of a global clock. Unfortunately, large clocked systems can be difficult to implement because of the inevitable problem of clock skews and delays, which can be especially acute in VLSI systems as feature sizes shrink. For the near term, good engineering and technology improvements can be expected to maintain the feasibility of clocking in such systems; however, clock distribution problems crop up in any technology as systems grow. An alternative means of enforcing necessary synchronization is the use of self-timed, asynchronous schemes, at the cost of increased design complexity and hardware cost. Realizing that different circumstances call for different synchronization methods, this paper provides a spectrum of synchronization models; based on the assumptions made for each model, theoretical lower bounds on clock skew are derived, and appropriate or best-possible synchronization schemes for large processor arrays are proposed.

One set of models is based on assumptions that allow the use of a pipelined clocking scheme, where more than one clock event is propagated at a time. In this case, it is shown that even assuming that physical variations along clock lines can produce skews between wires of the same length, any one-dimensional processor array can be correctly synchronized by a global pipelined clock while enjoying desirable properties such as modularity, expandability and robustness. This result cannot be extended to two-dimensional arrays, however—the paper shows that under this assumption, it is impossible to run a clock such that the maximum clock skew between two communicating cells will be bounded by a constant as systems grow. For such cases or where pipelined clocking is unworkable, a synchronization scheme incorporating both clocked and “asynchronous” elements is proposed.

References

  1. 1.Aleliunas, R.and Rosenberg, A.L., "On Embedding Rectangular Grids in Square Grids," IEEE Trans. Computers, Vol. C-31, No. 9, September 1982, pp. 907-913.Google ScholarGoogle Scholar
  2. 2.Bentley, J.L. and Kung, H.T., "A Tree Machine for Searching Problems," Proceedings of 1979 International Conference on Parallel Processing, IEEE, August 1979, pp. 257-266.Google ScholarGoogle Scholar
  3. 3.Franklin, M.A. and Wann. D.F., "Asynchronous and Clocked Control Structures for VLSI Interconnection Networks," Proceedings of the 9th International Symposium on Computer Architecture, April 1982, pp. 50-59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.Kung, H.T., "Why Systolic Architectures?," Computer Magazine, Vol. 15, No. 1, January 1982, pp. 37-46.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.Kung, H.T. and Leiserson, C.E., "Systolic Arrays (for VLSI)," Sparse Matrix Proceedings 1978. Duff, I.S. and Stewart, G.W., eds., Society for Industrial and Applied Mathematics, 1979, pp. 256-282, A slightly different version appears in Introduction to VLSI Systems by C.A. Mead and L.A. Conway, Addison-Wesley, 1980, Section 8.3.Google ScholarGoogle Scholar
  6. 6.Kung, S.Y. and Gal-Ezer, R.J., "Synchronous vs. Asynchronous Computation in VLSI Array Processors," Proceedings of the SPIE, vol. 341, May 1982, pp. 53-65.Google ScholarGoogle ScholarCross RefCross Ref
  7. 7.Lipton, R.J., Eisenstat, S.C. and DeMillo, R.A., "Space and Time Hierarchies for Classes of Control Structures and Data Structures," Journal of the ACM, Vol. 23, No. 4, October 1976, pp. 720-732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Mead, C.A. and Rem, M., "Cost and Performance of VLSI Computing Structures," IEEE Journal of Solid State Circuits, Vol. SC-14, No. 2, April 1979, pp. 455-462.Google ScholarGoogle ScholarCross RefCross Ref
  9. 9.Paterson, M.S., Ruzzo, W.L. and Snyder. L., "Bounds on Minimax Edge Length for Complete Binary Trees," Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, ACM SIGACT, May 1981, pp. 293-299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.Seitz, C.L., "Self-Timed VLSI Systems," Proceedings of Conference on Very Large Scale Integration: Architecture, Design, Fabrication, California Institute of Technology, January 1979, pp. 345-355.Google ScholarGoogle Scholar
  11. 11.Thompson, C.D.,A Complexity Theory for VLSI, PhD dissertation, Carnegie-Mellon University, Computer Science Department, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Synchronizing large VLSI processor arrays

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader