skip to main content
research-article

On approximating the ideal random access machine by physical machines

Published: 21 August 2009 Publication History

Abstract

The capability of the Random Access Machine (RAM) to execute any instruction in constant time is not realizable, due to fundamental physical constraints on the minimum size of devices and on the maximum speed of signals. This work explores how well the ideal RAM performance can be approximated, for significant classes of computations, by machines whose building blocks have constant size and are connected at a constant distance.
A novel memory structure is proposed, which is pipelined (can accept a new request at each cycle) and hierarchical, exhibiting optimal latency a(x) = O(x1/d) to address x, in d-dimensional realizations.
In spite of block-transfer or other memory-pipeline capabilities, a number of previous machine models do not achieve a full overlap of memory accesses. These are examples of machines with explicit data movement. It is shown that there are direct-flow computations (without branches and indirect accesses) that require time superlinear in the number of instructions, on all such machines.
To circumvent the explicit-data-movement constraints, the Speculative Prefetcher (SP) and the Speculative Prefetcher and Evaluator (SPE) processors are developed. Both processors can execute any direct-flow program in linear time. The SPE also executes in linear time a class of loop programs that includes many significant algorithms. Even quicksort, a somewhat irregular, recursive algorithm admits a linear-time SPE implementation. A relation between instructions called address dependence is introduced, which limits memory-access overlap and can lead to superlinear time, as illustrated with the classical merging algorithm.

References

[1]
Abolhassan, F., Drefenstedt, R., Keller, J., Paul, W. J., and Scheerer, D. 1993. On the physical design of prams. Comput. J. 36, 8, 756--762.
[2]
Aggarwal, A., Alpern, B., Chandra, A. K., and Snir, M. 1987a. A model for hierarchical memory. In Proceedings of the 19th ACM Symposium on Theory of Computing. ACM, New York, 305--314.
[3]
Aggarwal, A., Chandra, A. K., and Snir, M. 1987b. Hierarchical memory with block transfer. In Proceedings of the 28th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, 204--216.
[4]
Allen, R., and Kennedy, K. 2002. Optimizing Compilers for Modern Architectures. Morgan Kauffman, San Francisco.
[5]
Alpern, B., Carter, L., Feig, E., and Selker, T. 1994. The uniform memory hierarchy model of computation. Algorithmica 12, 2/3, 72--109.
[6]
Alverson, R., Callahan, D., Cummings, D., Koblenz, B., Porterfield, A., and Smith, B. 1990. The tera computer system. In Proceedings of the ACM International Conference on Super-Computing. ACM, New York, 1--6.
[7]
Amato, N., Perdue, J., Pietracaprina, A., Pucci, G., and Mathis, M. 2000. Predicting performance on SMP's. A case study: The SGI power challenge. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS). 729--737.
[8]
Backus, J. 1978. Can programming be liberated from the von neumann style? A functional style and its algebra of programs (1977 ACM Turing award lecture). Commun. ACM 21, 8, 613--641.
[9]
Bilardi, G., Ekanadham, K., and Pattnaik, P. 2001. Computational power of pipelined memory hierarchies. In Proceedings of the 13th ACM Symposium on Parallel Algorithms and Architectures. ACM, New York, 144--152.
[10]
Bilardi, G., Ekanadham, K., and Pattnaik, P. 2002. Optimal organizations for pipelined hierarchical memories. In Proceedings of the 14th ACM Symposium on Parallel Algorithms and Architectures. ACM, New York, 109--116.
[11]
Bilardi, G., Ekanadham, K., and Pattnaik, P. 2005. An address dependence model of computation for hierarchical memories with pipelined transfer. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. IEEE Computer Society Press, Los Alamitos.
[12]
Bilardi, G., Ekanadham, K., and Pattnaik, P. 2006. The speculative prefetcher and evaluator processor for pipelined memory hierarchies. In IWIA, International Workshop on Innovative Architectures for Future Generation High-Performance Processors and Systems. IEEE Computer Society Press, Los Alamitos, 29--43.
[13]
Bilardi, G., and Peserico, E. 2001. A characterization of temporal locality and its portability across memory hierarchies. In ICALP, International Colloquium on Automata, Languages, and Programming. ACM, New York, 128--139.
[14]
Bilardi, G., Pietracaprina, A., and D'Alberto, P. 2000. On the space and access complexity of computation dags. In Proceedings of the 26th Workshop on Graph-Theoretic Concepts in Computer Science. 47--58.
[15]
Bilardi, G., and Preparata, F. 1995. Horizons of parallel computation. J. Parall. Distrib. Comput. 27, 2, 172--182.
[16]
Chazelle, B., and Monier, L. 1985. A model of computation for VLSI with related complexity results. J. ACM 32, 3, 573--588.
[17]
Cook, S. A., and Reckhow, R. A. 1973. Time-bounded random access machines. J. Comput. Syst. Sci. 7, 4, 354--375.
[18]
Fotheringham, J. 1961. Dynamic storage allocation in the atlas computer, including an automatic use of a backing store. Commun. ACM 4, 10, 435--436.
[19]
Frigo, M., Leiserson, C. E., Prokop, H., and Ramachandran, S. 1999. Cache-oblivious algorithms. In Proceedings of the IEEE 40th Annual Symposium on Foundations of Computer Sciences. IEEE Computer Society Press, Los Alamitos, 361--366.
[20]
Hennessy, J. L., and Patterson, D. A. 2002. Computer Architecture: A Quantitative Approach. Morgan Kauffman, San Francisco.
[21]
Hong, J. W., and Kung, H. T. 1981. I/O complexity: The red-blue pebbling game. In Proceedings of the 13th ACM Symposium on Theory of Computing. ACM, New York, 326--333.
[22]
Kuck, D. J. 1978. The Structure of Computers and Computations. vol. 1. Wiley, New York.
[23]
Leiserson, C. 1981. Area-efficient vlsi computation. Ph.D. dissertation, Department of Computer Science, Carnegie Mellon University, Pittsburg.
[24]
Luccio, F., and Pagli, L. 1993. A model of sequential computation with pipelined access to memory. Math. Syst. Theory 26, 4, 343--356.
[25]
Mattson, R. L., Gecsei, J., Slutz, D. R., and Traiger, I. L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 2, 78--117.
[26]
Metcalf, C. 1993. Data prefetching: A cost/performance analysis. Area Exam, MIT, Cambridge.
[27]
Milutinovic, V., and Valero, M., Eds. 1999. Special issue on cache memory and related problems. IEEE Trans. Comput. 48, 2.
[28]
Polychronopoulos, C. 1987. Loop coalescing: A compiler transformation for parallel machines. In Proceedings of the International Conference on Parallel Processing. 235--242.
[29]
Przybylski, S. S. 1990. Cache and Memory Hierarchy Design. A Performance Directed Approach. Morgan-Kaufmann, Palo Alto.
[30]
Savage, J. E. 1997. Models of Computation: Exploring the Power of Computing. Addison-Wesley, Reading.
[31]
Smith, B. J. 1981. Architecture and applications of the HEP multiprocessor system. In Real-Time Signal Processing IV. vol. 298. 241--248.
[32]
Valiant, L. G. 1990. General purpose parallel architectures. Elsevier-MIT Press, Amsterdam, The Netherlands.
[33]
Vitter, J. S. 1998. External memory algorithms. In Proceedings of the 6th Annual European Symposium on Algorithms. Springer-Verlag, Berlin, Germany, 1--25.
[34]
von Neumann, J. 1945. First draft of a report on the EDVAC. http://www.virtualtravelog.net/enteries/2003-08-TheFirstDraft.pdf.
[35]
Whaley, R. C., and Dongarra, J. J. 1998. Automatically Tuned Linear Algebra Software. http://www.netlib.org/atlas/index.html.
[36]
Wilkes, M. 2001. The memory gap and the future of high-performance memories. ACM SIGARCH Comput. Arch. News 29, 1, 2--7.
[37]
Wolfe, M. 1995. High-Performance Compilers for Parallel Computing. Addison-Wesley, Reading.
[38]
Wulf, W., and McKee, S. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Comput. Arch. News 23, 1, 20--24.
[39]
Yotov, K., Roeder, T., Pingali, K., Gunnels, J., and Gustavson, F. 2007. An experimental comparison of cache-oblivious and cache-conscious programs. In Proceedings of the 19th ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, 93--104.

Cited By

View all
  • (2017)Optimal On-Line Computation of Stack Distances for MIN and OPTProceedings of the Computing Frontiers Conference10.1145/3075564.3075571(237-246)Online publication date: 15-May-2017
  • (2016)Outline of a Thick Control Flow Architecture2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)10.1109/SBAC-PADW.2016.9(1-6)Online publication date: Oct-2016
  • (2012)Efficient Stack Distance Computation for a Class of Priority Replacement PoliciesInternational Journal of Parallel Programming10.1007/s10766-012-0200-241:3(430-468)Online publication date: 20-Jul-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 56, Issue 5
August 2009
164 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/1552285
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2009
Accepted: 01 March 2009
Revised: 01 November 2008
Received: 01 October 2007
Published in JACM Volume 56, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Physical constraints on machines
  2. pipelined hierarchical memory
  3. speculative processors

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Optimal On-Line Computation of Stack Distances for MIN and OPTProceedings of the Computing Frontiers Conference10.1145/3075564.3075571(237-246)Online publication date: 15-May-2017
  • (2016)Outline of a Thick Control Flow Architecture2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)10.1109/SBAC-PADW.2016.9(1-6)Online publication date: Oct-2016
  • (2012)Efficient Stack Distance Computation for a Class of Priority Replacement PoliciesInternational Journal of Parallel Programming10.1007/s10766-012-0200-241:3(430-468)Online publication date: 20-Jul-2012
  • (2011)Efficient stack distance computation for priority replacement policiesProceedings of the 8th ACM International Conference on Computing Frontiers10.1145/2016604.2016607(1-10)Online publication date: 3-May-2011

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media