Abstract
Aquarius-II is a cache coherent multiprocessor system designed for the parallel execution of Prolog programs. It contains two tiers of memory: synchronization memory and high bandwidth (HB) memory. The synchronization memory consists of snooping caches connected to a bus and is used to store rendezvous points, synchronization bits, synchronization variables such as locks and semaphores and most of the write shared data. The HB memory is used to store the bulk of the application program code and data. It contains caches and an inexpensive VLSI chip based crossbar interconnection network to memory. The caches connected to the crossbar do not have full snooping capability. The architecture is evaluated by a full simulation of parallel execution of Prolog programs on Aquarius-II. The design details of the components of the architecture and simulation results are presented. Simulation results indicate that the two tier memory system significantly reduces memory interference and speeds up synchronization when compared to a single bus multi. This shared memory multiprocesor architecture has the potential to support other parallel programming paradigms.
Similar content being viewed by others
References
O. P. Agrawal and A. V. Pohm, “Cache memory systems for multiprocessor architectures,” in Proceedings of the 11th Annual Symposium on Computer Architecture, Ann Arbor, MI, June 1984.
P. Bitar and A. Despain, “Multiprocessor cache synchronization issues, innovations, evolution,” in Proceedings of the 13th Intl. Symposium on Computer Architecture, Tokyo, Japan, June 1986, pp. 424–433.
D. R. Busing, “Design and simulation of the key components of Aquarius IIU system,” CS Division, University of California, Berkeley, CA, Masters Report, Sept. 1989.
D. R. Busing, V. P. Srini, G. E. Smine, M. J. Carlton, and A. M. Despain, “The Aquarius-IIU System,” in Proceedings of the first International Conference on System Integration, Morristown, NJ, April 1990.
L. M. Censier and P. Feautrier, “A new solution to coherence problems in multicache systems.” IEEE Transactions on Computers, C-27, No. 12, pp. 1112–1118, Dec. 1978.
J. H. Chang, “High performance execution of Prolog programs based on a static data dependency analysis,” University of California, Berkeley, CA, Ph. D. Thesis, CS Division Report No. UCB/CSD 86/263, October 1985.
C. Chen, “Scheduling heurisitics and runtime data structures for expoiting AND parallelism and OR parallelism in Prolog,” University of California, Berkeley, CA, Ph. D. Thesis, CS Division Report No. UCB/CSD 90/???, Aug. 1990.
J. S. Conery, “The AND/OR process model for parallel interpretation of logic programs,” University of California, Irvine, Technical Report 204, June 1983.
A. M. Despain and V. P. Srini, “Multiprocessor architecture research for Prolog,” in Proceedings of the State of California MICRO-86 Report, March 1988.
T. M. Nguyen, V. P. Srini, and A. M. Despain, “A two-tier memory architecture for high-performance multiprocessor systems,” in Proceedings of the International Conference on Supercomputing, Saint-Malo, France, July 1988.
B. S. Fagin, “A parallel execution model for Prolog,” University of California, Berkeley, CA, Ph. D. Thesis, CS Division Report No. UCB/CSD 87/380, Nov. 1987.
S. Frank, “Tightly coupled multiprocessor system speeds memory-access times.” Electronics, Jan. 12, 1984.
M. A. Franklin, D. F. Wann, and W. J. Thomas, “Pin limitations and partitioning of VLSI interconnection networks.” IEEE Transactions on Computers, Nov. 1982.
J. Goodman, “Using cache memories to reduce processor-memory traffic,” in Proceedings of the 10th Intl. Symposium on Computer Architecture, Stockholm, Sweden, June 1983.
A. Gottlieb et al., “The NYU ultra computer,” IEEE TC, C-32, No. 2, pp. 175–189, February 1983.
M. Hill et al., “Design decisions in SPUR.” IEEE Computer, pp. 1–22, Nov. 1986.
R. H. Katz, S. J. Eggers, D. A. Wood, C. L. Perkins, and R. G. Sheldon, “Implementing a cache consistency protocol,” in Proceedings of the 12th Intl. Symposium on Computer Architecture, Boston, June 1985, pp. 276–283.
S. F. Lundstrom and G. H. Barnes, “A controllable MIMD architecture,” in Proceedings of the 1980 Parallel Processing Conference, Boyne Highlands, Michigan, Aug. 1980, pp. 19–27.
S. F. Lundstrom, “Applications considerations in the system design of highly concurrent multiprocessors.” IEEE Transactions on Computers, C-36, No. 11, Nov. 1987, pp. 1292–1309.
T. M. Nguyen, “Hybrid memory management for parallel execution of Prolog on shared memory multiprocessors,” University of California, Berkeley, Ph.D. Thesis, CS Division Report No. UCB/CSD 90/575, May 1990.
E. I. Organick, Computer Systems Organization: The B5700/6700 Series. Academic Press Inc.: New York, 1973.
C.V. Ravishankar and J. Goodman, “Cache implementation for multiple processors.” IEEE Spring Compcon Conference, San Francisco, February 1983.
R. D. Rettberg, W. R. Crowther, P. P. Carvey, and R. S. Tomlinson, “The monarch parallel processor hardware design.” IEEE Computer, 23, no. 4, April 1990.
P. Van Roy, A Prolog compiler for the PLM, University of California, Berkeley, CA, Master's Thesis, August 1984.
R. M. Russell, “The Cray-1 computer system.” Communications of the ACM, 21, No. 1, pp. 63–72, Jan 1978.
D. Gajski, D. Kuck, D. Lawrie, and A. Sameh, “Cedar—A large scale multiprocessor,” Proceedings of the 1983 Parallel Processing Conference, Michigan, Aug. 1983, pp. 524–429.
M. Satyanarayanan, Multiprocessors 000A Comparative Study. Prentice-Hall, Inc.: 1980.
A. J. Smith, “Cache memories.” Computing Surveys, 14, No. 3, pp. 473–530, Sept. 1982.
V. P. Srini and J. F. Asenjo, “Analysis of Cray-1S Architecture,” in Proceedings of the 10th Intl. Symposium on Computer Architecture, Stockholm, Sweden, June 1983, pp. 194–206.
V. P. Srini, “An architecture for doing concurrent systems research,” in Proceedings of the National Computer Conference, Chicago, July 1985.
V. P. Srini, J. V. Tam, T. M. Nguyen, Y. N. Patt, A. M. Despain, M. Moll, and D. Ellsworth, “A CMOS chip for Prolog,” in Proceedings of the Intl. Conference on Computer Design, Rye Town, New York, Oct. 1987, pp. 605–610.
V. P. Srini, “A low-latency crossbar chip for multiprocessors,” U.S. Patent No. 5,053,942. Issued October 1, 1991.
C. K. Tang, “Cache system design in the tightly coupled multiprocessor system,” in Proceedings of the National Computer Conference, 45, pp. 749–753, 1976.
S. Thakkar, Sequent Symmetry.
E. Tick and D. H. D. Warren, Towards a Pipelined Prolog Processor, SRI International, Technical Report, Menlo Park, CA, August 1983.
S. Wallach, “The Convex C-1 64-bit Supercomputer,” Digest of Papers, Spring COMPCON 85, pp. 122–126, San Francisco, Feb. 1985.
D. F. Wann and M. A. Franklin, “Asynchronous and Clocked Control Structure for VLSI-Based Interconnection Networks,” IEEE Transactions on Computers, March 1983.
W. Wulf and C. Bell, “C.mmp—A multi-Miniprocessor,” AFIPS Proc. (FJCC), vol. 41, Part 2, pp. 756–777, 1972.
Rights and permissions
About this article
Cite this article
Srini, V.P., Nguyen, T.M., Busing, D.R. et al. Design and Simulation of the Aquarius-II Multiprocessor. Journal of Systems Integration 7, 151–178 (1997). https://doi.org/10.1023/A:1008211321698
Issue Date:
DOI: https://doi.org/10.1023/A:1008211321698