research-article

Exploring the potential of heterogeneous von neumann/dataflow execution models

Authors:

Vinay Gangadhar,

Karthikeyan SankaralingamAuthors Info & Claims

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

Pages 298 - 310

https://doi.org/10.1145/2749469.2750380

Published: 13 June 2015 Publication History

Abstract

General purpose processors (GPPs), from small inorder designs to many-issue out-of-order, incur large power overheads which must be addressed for future technology generations. Major sources of overhead include structures which dynamically extract the data-dependence graph or maintain precise state. Considering irregular workloads, current specialization approaches either heavily curtail performance, or provide simply too little benefit. Interestingly, well known explicit-dataflow architectures eliminate these overheads by directly executing the data-dependence graph and eschewing instruction-precise recoverability. However, even after decades of research, dataflow architectures have yet to come into prominence as a solution. We attribute this to a lack of effective control speculation and the latency overhead of explicit communication, which is crippling for certain codes.

This paper makes the observation that if both out-of-order and explicit-dataflow were available in one processor, many types of GPP cores can benefit from dynamically switching during certain phases of an application's lifetime. Analysis reveals that an ideal explicit-dataflow engine could be profitable for more than half of instructions, providing significant performance and energy improvements. The challenge is to achieve these benefits without introducing excess hardware complexity. To this end, we propose the Specialization Engine for Explicit-Dataflow (SEED). Integrated with an inorder core, we see 1.67× performance and 1.65× energy benefits, with an Out-Of-Order (OOO) dual-issue core we see 1.33× and 1.70×, and with a quad-issue OOO, 1.14× and 1.54×.

References

[1]

K. Arvind and R. S. Nikhil, "Executing a program on the mit tagged-token dataflow architecture," IEEE Trans. Comput., 1990.

Digital Library

[2]

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, 2011.

Digital Library

[3]

M. Budiu, P. V. Artigas, and S. C. Goldstein, "Dataflow: A complement to superscalar," in ISPASS, 2005.

Digital Library

[4]

R. Buehrer and K. Ekanadham, "Incorporating data flow ideas into von neumann processors for parallel execution," Computers, IEEE Transactions on, 1987.

Digital Library

[5]

D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, W. Yoder, and the TRIPS Team, "Scaling to the end of silicon with EDGE architectures," IEEE Computer, 2004.

Digital Library

[6]

N. Clark, A. Hormati, and S. Mahlke, "Veal: Virtualized execution accelerator for loops," in ISCA '08.

Digital Library

[7]

N. Clark, M. Kudlur, H. Park, S. Mahlke, and K. Flautner, "Application-specific processing on a general-purpose core via transparent instruction set customization," in MICRO, 2004.

Digital Library

[8]

B. Fields, R. Bodik, M. Hill, and C. Newburn, "Using interaction costs for microarchitectural bottleneck analysis," in MICRO, 2003.

Digital Library

[9]

M. Gebhart, B. A. Maher, K. E. Coons, J. Diamond, P. Gratz, M. Marino, N. Ranganathan, B. Robatmili, A. Smith, J. Burrill, S. W. Keckler, D. Burger, and K. S. McKinley, "An evaluation of the trips computer system," in ASPLOS '09.

Digital Library

[10]

D. Gibson and D. A. Wood, "Forwardflow: A scalable core for power-constrained cmps," in ISCA, 2010.

Digital Library

[11]

V. Govindaraju, C.-H. Ho, and K. Sankaralingam, "Dynamically specialized datapaths for energy efficient computing," in HPCA, 2011.

Digital Library

[12]

V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim, "Dyser: Unifying functionality and parallelism specialization for energy efficient computing," IEEE Micro, 2012.

Digital Library

[13]

P. Greenhalgh, "Big. little processing with arm cortex-a15 & cortex-a7," ARM White Paper, 2011.

[14]

S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August, "Bundled execution of recurring traces for energy-efficient general purpose processing," in MICRO, 2011.

Digital Library

[15]

M. Hayenga, V. Naresh, and M. Lipasti, "Revolver: Processor architecture for power efficient loop execution," in HPCA, 2014.

[16]

C.-H. Ho, V. Govindaraju, T. Nowatzki, R. Nagaraju, Z. Marzec, P. Agarwal, C. Frericks, R. Cofell, and K. Sankaralingam, "Performance evaluation of a dyser fpga prototype system spanning the compiler, microarchitecture, and hardware implementation," in ISPASS, 2015.

[17]

C.-H. Ho, S. J. Kim, and K. Sankaralingam, "Efficient execution of memory access phases using dataflow specialization," in ISCA, 2015.

Digital Library

[18]

R. A. Iannucci, "Toward a dataflow/von neumann hybrid architecture," in ISCA, 1988.

Digital Library

[19]

R. Krashinsky, C. Batten, M. Hampton, S. Gerding, B. Pharris, J. Casper, and K. Asanovic, "The vector-thread architecture." in ISCA, 2004.

Digital Library

[20]

C. Lee, M. Potkonjak, and W. Mangione-Smith, "Mediabench: a tool for evaluating and synthesizing multimedia and communications systems," in MICRO, 1997.

Digital Library

[21]

Y. Lee, R. Avizienis, A. Bishara, R. Xia, D. Lockhart, C. Batten, and K. Asanović, "Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators," in ACM SIGARCH Computer Architecture News, 2011.

Digital Library

[22]

S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO '09.

Digital Library

[23]

A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. Dreslinski, T. F. Wenisch, and S. Mahlke, "Composite cores: Pushing heterogeneity into a core," in MICRO, 2012.

Digital Library

[24]

M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, S. C. Goldstein, and M. Budiu, "Tartan: evaluating spatial computation for whole program execution," in ASPLOS, 2006.

Digital Library

[25]

N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, "Cacti 6.0: A tool to model large caches," HP Laboratories, 2009.

[26]

T. Nowatzki, V. Gangadhar, and K. Sankaralingam, "Studying hybrid von-neumann/dataflow execution models," Computer Sciences Department, University of Wisconsin-Madison, Tech. Rep., 2015.

[27]

T. Nowatzki, M. Sartin-Tarm, L. De Carli, K. Sankaralingam, C. Estan, and B. Robatmili, "A general constraint-centric scheduling framework for spatial architectures," in PLDI, 2013.

Digital Library

[28]

S. Padmanabha, A. Lukefahr, R. Das, and S. A. Mahlke, "Trace based phase prediction for tightly-coupled heterogeneous cores," in MICRO, 2013.

Digital Library

[29]

G. M. Papadopoulos, "Monsoon: an explicit token-store architecture," in ISCA, 1990.

Digital Library

[30]

Y. Park, J. J. K. Park, H. Park, and S. Mahlke, "Libra: Tailoring simd execution using heterogeneous hardware and dynamic configurability," in MICRO, 2012.

Digital Library

[31]

K. Sankaralingam, R. Nagarajan, R. McDonald, R. Desikan, S. Drolia, M. Govindan, P. Gratz, D. Gulati, H. Hanson, C. Kim, H. Liu, N. Ranganathan, S. Sethumadhavan, S. Sharif, P. Shivakumar, S. W. Keckler, and D. Burger, "Distributed Microarchitectural Protocols in the TRIPS Prototype Processor," in MICRO, 2006.

Digital Library

[32]

S. Srinath, B. Ilbeyi, M. Tan, G. Liu, Z. Zhang, and C. Batten, "Architectural specialization for inter-iteration loop dependence patterns," in MICRO, 2014.

Digital Library

[33]

S. Swanson, K. Michelson, A. Schwerin, and M. Oskin, "Wavescalar," in MICRO, 2003.

Digital Library

[34]

A. Venkat and D. M. Tullsen, "Harnessing isa diversity: Design of a heterogeneous-isa chip multiprocessor," in ISCA, 2014.

Digital Library

[35]

G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor, "Conservation Cores: Reducing the Energy of Mature Computations," in ASPLOS '10.

Digital Library

[36]

G. Venkatesh, J. Sampson, N. Goulding-hotta, S. K. Venkata, M. B. Taylor, and S. Swanson, "Qscores: Trading dark silicon for scalable energy efficiency with quasi-specific cores," in MICRO, 2011.

Digital Library

Cited By

Li ZDangi PYin CBandara TJuneja RTan CBai ZMitra TEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Enhancing CGRA Efficiency Through Aligned Compute and Communication ProvisioningProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707230(410-425)Online publication date: 30-Mar-2025
https://dl.acm.org/doi/10.1145/3669940.3707230
Agarwal NFream MGhosh SSchwedock BBeckmann N(2024)The TYR Dataflow Architecture: Improving Locality by Taming Parallelism2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00089(1184-1200)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00089
Koizumi TShioya RSugita SAmano TDegawa YKadomoto JIrie HSakai S(2023)Clockhands: Rename-free Instruction Set Architecture for Out-of-order ProcessorsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614272(1-16)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614272
Show More Cited By

Index Terms

Exploring the potential of heterogeneous von neumann/dataflow execution models
1. Hardware

Recommendations

Heterogeneous Von Neumann/dataflow microprocessors

General-purpose processors (GPPs), which traditionally rely on a Von Neumann-based execution model, incur burdensome power overheads, largely due to the need to dynamically extract parallelism and maintain precise state. Further, it is extremely ...
Exploring the potential of heterogeneous von neumann/dataflow execution models
ISCA'15

General purpose processors (GPPs), from small inorder designs to many-issue out-of-order, incur large power overheads which must be addressed for future technology generations. Major sources of overhead include structures which dynamically extract the ...
Three Architectural Models for Compiler-Controlled Speculative Execution

To effectively exploit instruction level parallelism, the compiler must move instructions across branches. When an instruction is moved above a branch that it is control dependent on, it is considered to be speculatively executed since it is executed ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

June 2015

768 pages

ISBN:9781450334020

DOI:10.1145/2749469

General Chair:
Debbie Marr
Intel
,
Program Chair:
David Albonesi
Cornell

ACM SIGARCH Computer Architecture News Volume 43, Issue 3S
ISCA'15
June 2015
745 pages
ISSN:0163-5964
DOI:10.1145/2872887
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

IEEE TCCA: IEEE Computer Society Technical Committee on Computer Architecture
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

ISCA '15

Sponsor:

IEEE TCCA
SIGARCH

ISCA '15: The 42nd Annual International Symposium on Computer Architecture

June 13 - 17, 2015

Oregon, Portland

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

69
Total Citations
View Citations
1,226
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)4

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li ZDangi PYin CBandara TJuneja RTan CBai ZMitra TEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Enhancing CGRA Efficiency Through Aligned Compute and Communication ProvisioningProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707230(410-425)Online publication date: 30-Mar-2025
https://dl.acm.org/doi/10.1145/3669940.3707230
Agarwal NFream MGhosh SSchwedock BBeckmann N(2024)The TYR Dataflow Architecture: Improving Locality by Taming Parallelism2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00089(1184-1200)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00089
Koizumi TShioya RSugita SAmano TDegawa YKadomoto JIrie HSakai S(2023)Clockhands: Rename-free Instruction Set Architecture for Out-of-order ProcessorsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614272(1-16)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614272
Elsabbagh FSheikhha SYing VNguyen QEmer JSanchez D(2023)Accelerating RTL Simulation with Hardware-Software Co-DesignProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614257(153-166)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614257
Raskar SMonsalve Diaz JApplencourt TKumaran KGao GVieira MCardellini VDi Marco ATuma P(2023)Implementation of Dataflow Software Pipelining for Codelet ModelProceedings of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578244.3583734(161-172)Online publication date: 15-Apr-2023
https://dl.acm.org/doi/10.1145/3578244.3583734
Mazumdar SScionti AZuckerman SPortero A(2023)NoC-based hardware software co-design framework for dataflow thread managementThe Journal of Supercomputing10.1007/s11227-023-05335-879:16(17983-18020)Online publication date: 11-May-2023
https://dl.acm.org/doi/10.1007/s11227-023-05335-8
Shah NMeert WVerhelst MShah NMeert WVerhelst M(2023)DAG Processing Unit Version 2 (DPU-v2): Efficient Execution of Irregular Workloads on a Spatial DatapathEfficient Execution of Irregular Dataflow Graphs10.1007/978-3-031-33136-7_5(89-123)Online publication date: 26-Apr-2023
https://doi.org/10.1007/978-3-031-33136-7_5
Golestani HSen RYoung VGupta GRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)CalipersProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532390(1-14)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532390
Muralidhar RBorovica-Gajic RBuyya R(2022)Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and StandardsACM Computing Surveys10.1145/351109454:11s(1-37)Online publication date: 9-Sep-2022
https://dl.acm.org/doi/10.1145/3511094
Sankaralingam KNowatzki TGangadhar VShah PDavies MGalliher WGuo ZKhare JVijay DPalamuttam PPunde MTan AThiruvengadam VWang RXu SSalapura VZahran MChong FTang L(2022)The Mozart reuse exposed dataflow processor for AI and beyondProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3533040(978-992)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3533040
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten