research-article

On approximating the ideal random access machine by physical machines

Authors:

Gianfranco Bilardi,

Kattamuri Ekanadham,

Pratap PattnaikAuthors Info & Claims

Journal of the ACM (JACM), Volume 56, Issue 5

Article No.: 27, Pages 1 - 57

https://doi.org/10.1145/1552285.1552288

Published: 21 August 2009 Publication History

Abstract

The capability of the Random Access Machine (RAM) to execute any instruction in constant time is not realizable, due to fundamental physical constraints on the minimum size of devices and on the maximum speed of signals. This work explores how well the ideal RAM performance can be approximated, for significant classes of computations, by machines whose building blocks have constant size and are connected at a constant distance.

A novel memory structure is proposed, which is pipelined (can accept a new request at each cycle) and hierarchical, exhibiting optimal latency a(x) = O(x^1/d) to address x, in d-dimensional realizations.

In spite of block-transfer or other memory-pipeline capabilities, a number of previous machine models do not achieve a full overlap of memory accesses. These are examples of machines with explicit data movement. It is shown that there are direct-flow computations (without branches and indirect accesses) that require time superlinear in the number of instructions, on all such machines.

To circumvent the explicit-data-movement constraints, the Speculative Prefetcher (SP) and the Speculative Prefetcher and Evaluator (SPE) processors are developed. Both processors can execute any direct-flow program in linear time. The SPE also executes in linear time a class of loop programs that includes many significant algorithms. Even quicksort, a somewhat irregular, recursive algorithm admits a linear-time SPE implementation. A relation between instructions called address dependence is introduced, which limits memory-access overlap and can lead to superlinear time, as illustrated with the classical merging algorithm.

References

[1]

Abolhassan, F., Drefenstedt, R., Keller, J., Paul, W. J., and Scheerer, D. 1993. On the physical design of prams. Comput. J. 36, 8, 756--762.

[2]

Aggarwal, A., Alpern, B., Chandra, A. K., and Snir, M. 1987a. A model for hierarchical memory. In Proceedings of the 19th ACM Symposium on Theory of Computing. ACM, New York, 305--314.

Digital Library

[3]

Aggarwal, A., Chandra, A. K., and Snir, M. 1987b. Hierarchical memory with block transfer. In Proceedings of the 28th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, 204--216.

Digital Library

[4]

Allen, R., and Kennedy, K. 2002. Optimizing Compilers for Modern Architectures. Morgan Kauffman, San Francisco.

Digital Library

[5]

Alpern, B., Carter, L., Feig, E., and Selker, T. 1994. The uniform memory hierarchy model of computation. Algorithmica 12, 2/3, 72--109.

[6]

Alverson, R., Callahan, D., Cummings, D., Koblenz, B., Porterfield, A., and Smith, B. 1990. The tera computer system. In Proceedings of the ACM International Conference on Super-Computing. ACM, New York, 1--6.

Digital Library

[7]

Amato, N., Perdue, J., Pietracaprina, A., Pucci, G., and Mathis, M. 2000. Predicting performance on SMP's. A case study: The SGI power challenge. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS). 729--737.

Digital Library

[8]

Backus, J. 1978. Can programming be liberated from the von neumann style&quest; A functional style and its algebra of programs (1977 ACM Turing award lecture). Commun. ACM 21, 8, 613--641.

Digital Library

[9]

Bilardi, G., Ekanadham, K., and Pattnaik, P. 2001. Computational power of pipelined memory hierarchies. In Proceedings of the 13th ACM Symposium on Parallel Algorithms and Architectures. ACM, New York, 144--152.

Digital Library

[10]

Bilardi, G., Ekanadham, K., and Pattnaik, P. 2002. Optimal organizations for pipelined hierarchical memories. In Proceedings of the 14th ACM Symposium on Parallel Algorithms and Architectures. ACM, New York, 109--116.

Digital Library

[11]

Bilardi, G., Ekanadham, K., and Pattnaik, P. 2005. An address dependence model of computation for hierarchical memories with pipelined transfer. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. IEEE Computer Society Press, Los Alamitos.

Digital Library

[12]

Bilardi, G., Ekanadham, K., and Pattnaik, P. 2006. The speculative prefetcher and evaluator processor for pipelined memory hierarchies. In IWIA, International Workshop on Innovative Architectures for Future Generation High-Performance Processors and Systems. IEEE Computer Society Press, Los Alamitos, 29--43.

Digital Library

[13]

Bilardi, G., and Peserico, E. 2001. A characterization of temporal locality and its portability across memory hierarchies. In ICALP, International Colloquium on Automata, Languages, and Programming. ACM, New York, 128--139.

Digital Library

[14]

Bilardi, G., Pietracaprina, A., and D'Alberto, P. 2000. On the space and access complexity of computation dags. In Proceedings of the 26th Workshop on Graph-Theoretic Concepts in Computer Science. 47--58.

Digital Library

[15]

Bilardi, G., and Preparata, F. 1995. Horizons of parallel computation. J. Parall. Distrib. Comput. 27, 2, 172--182.

Digital Library

[16]

Chazelle, B., and Monier, L. 1985. A model of computation for VLSI with related complexity results. J. ACM 32, 3, 573--588.

Digital Library

[17]

Cook, S. A., and Reckhow, R. A. 1973. Time-bounded random access machines. J. Comput. Syst. Sci. 7, 4, 354--375.

Digital Library

[18]

Fotheringham, J. 1961. Dynamic storage allocation in the atlas computer, including an automatic use of a backing store. Commun. ACM 4, 10, 435--436.

Digital Library

[19]

Frigo, M., Leiserson, C. E., Prokop, H., and Ramachandran, S. 1999. Cache-oblivious algorithms. In Proceedings of the IEEE 40th Annual Symposium on Foundations of Computer Sciences. IEEE Computer Society Press, Los Alamitos, 361--366.

Digital Library

[20]

Hennessy, J. L., and Patterson, D. A. 2002. Computer Architecture: A Quantitative Approach. Morgan Kauffman, San Francisco.

Digital Library

[21]

Hong, J. W., and Kung, H. T. 1981. I/O complexity: The red-blue pebbling game. In Proceedings of the 13th ACM Symposium on Theory of Computing. ACM, New York, 326--333.

Digital Library

[22]

Kuck, D. J. 1978. The Structure of Computers and Computations. vol. 1. Wiley, New York.

Digital Library

[23]

Leiserson, C. 1981. Area-efficient vlsi computation. Ph.D. dissertation, Department of Computer Science, Carnegie Mellon University, Pittsburg.

Digital Library

[24]

Luccio, F., and Pagli, L. 1993. A model of sequential computation with pipelined access to memory. Math. Syst. Theory 26, 4, 343--356.

[25]

Mattson, R. L., Gecsei, J., Slutz, D. R., and Traiger, I. L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 2, 78--117.

Digital Library

[26]

Metcalf, C. 1993. Data prefetching: A cost/performance analysis. Area Exam, MIT, Cambridge.

[27]

Milutinovic, V., and Valero, M., Eds. 1999. Special issue on cache memory and related problems. IEEE Trans. Comput. 48, 2.

Digital Library

[28]

Polychronopoulos, C. 1987. Loop coalescing: A compiler transformation for parallel machines. In Proceedings of the International Conference on Parallel Processing. 235--242.

[29]

Przybylski, S. S. 1990. Cache and Memory Hierarchy Design. A Performance Directed Approach. Morgan-Kaufmann, Palo Alto.

Digital Library

[30]

Savage, J. E. 1997. Models of Computation: Exploring the Power of Computing. Addison-Wesley, Reading.

Digital Library

[31]

Smith, B. J. 1981. Architecture and applications of the HEP multiprocessor system. In Real-Time Signal Processing IV. vol. 298. 241--248.

[32]

Valiant, L. G. 1990. General purpose parallel architectures. Elsevier-MIT Press, Amsterdam, The Netherlands.

[33]

Vitter, J. S. 1998. External memory algorithms. In Proceedings of the 6th Annual European Symposium on Algorithms. Springer-Verlag, Berlin, Germany, 1--25.

Digital Library

[34]

von Neumann, J. 1945. First draft of a report on the EDVAC. http://www.virtualtravelog.net/enteries/2003-08-TheFirstDraft.pdf.

[35]

Whaley, R. C., and Dongarra, J. J. 1998. Automatically Tuned Linear Algebra Software. http://www.netlib.org/atlas/index.html.

[36]

Wilkes, M. 2001. The memory gap and the future of high-performance memories. ACM SIGARCH Comput. Arch. News 29, 1, 2--7.

Digital Library

[37]

Wolfe, M. 1995. High-Performance Compilers for Parallel Computing. Addison-Wesley, Reading.

Digital Library

[38]

Wulf, W., and McKee, S. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Comput. Arch. News 23, 1, 20--24.

Digital Library

[39]

Yotov, K., Roeder, T., Pingali, K., Gunnels, J., and Gustavson, F. 2007. An experimental comparison of cache-oblivious and cache-conscious programs. In Proceedings of the 19th ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, 93--104.

Digital Library

Cited By

Bilardi GEkanadham KPattnaik PGiorgi RBecchi MPalumbo F(2017)Optimal On-Line Computation of Stack Distances for MIN and OPTProceedings of the Computing Frontiers Conference10.1145/3075564.3075571(237-246)Online publication date: 15-May-2017
https://dl.acm.org/doi/10.1145/3075564.3075571
Forsell MRoivainen JLeppanen V(2016)Outline of a Thick Control Flow Architecture2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)10.1109/SBAC-PADW.2016.9(1-6)Online publication date: Oct-2016
https://doi.org/10.1109/SBAC-PADW.2016.9
Bilardi GEkanadham KPattnaik P(2012)Efficient Stack Distance Computation for a Class of Priority Replacement PoliciesInternational Journal of Parallel Programming10.1007/s10766-012-0200-241:3(430-468)Online publication date: 20-Jul-2012
https://doi.org/10.1007/s10766-012-0200-2
Show More Cited By

Index Terms

On approximating the ideal random access machine by physical machines

Recommendations

Instruction sets for parallel random access machines
Value prediction in VLIW machines
Special Issue: Proceedings of the 26th annual international symposium on Computer architecture (ISCA '99)

The performance of VLIW architectures is dependent on the capability of the compiler to detect and exploit instruction-level parallelism during instruction scheduling. To exploit the detected parallelism, instructions are reordered to reduce the length ...
Feasible Time-Optimal Algorithms for Boolean Functions on Exclusive-Write Parallel Random-Access machines

It was shown some years ago that the computation time for many important Boolean functions of $n$ arguments on concurrent-read exclusive-write parallel random-access machines (CREW PRAMs) of unlimited size is at least $arphi (n) \approx 0.72\log_...

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM

Journal of the ACM Volume 56, Issue 5

August 2009

164 pages

ISSN:0004-5411

EISSN:1557-735X

DOI:10.1145/1552285

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2009

Accepted: 01 March 2009

Revised: 01 November 2008

Received: 01 October 2007

Published in JACM Volume 56, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
594
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bilardi GEkanadham KPattnaik PGiorgi RBecchi MPalumbo F(2017)Optimal On-Line Computation of Stack Distances for MIN and OPTProceedings of the Computing Frontiers Conference10.1145/3075564.3075571(237-246)Online publication date: 15-May-2017
https://dl.acm.org/doi/10.1145/3075564.3075571
Forsell MRoivainen JLeppanen V(2016)Outline of a Thick Control Flow Architecture2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)10.1109/SBAC-PADW.2016.9(1-6)Online publication date: Oct-2016
https://doi.org/10.1109/SBAC-PADW.2016.9
Bilardi GEkanadham KPattnaik P(2012)Efficient Stack Distance Computation for a Class of Priority Replacement PoliciesInternational Journal of Parallel Programming10.1007/s10766-012-0200-241:3(430-468)Online publication date: 20-Jul-2012
https://doi.org/10.1007/s10766-012-0200-2
Bilardi GEkanadham KPattnaik PCascaval CTrancoso PPrasanna V(2011)Efficient stack distance computation for priority replacement policiesProceedings of the 8th ACM International Conference on Computing Frontiers10.1145/2016604.2016607(1-10)Online publication date: 3-May-2011
https://dl.acm.org/doi/10.1145/2016604.2016607

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents