Article

A spatial path scheduling algorithm for EDGE architectures

Authors:

Katherine E. Coons,

Kathryn S. McKinley,

Sundeep K. KushwahaAuthors Info & Claims

ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

Pages 129 - 140

https://doi.org/10.1145/1168857.1168875

Published: 20 October 2006 Publication History

Abstract

Growing on-chip wire delays are motivating architectural features that expose on-chip communication to the compiler. EDGE architectures are one example of communication-exposed microarchitectures in which the compiler forms dataflow graphs that specify how the microarchitecture maps instructions onto a distributed execution substrate. This paper describes a compiler scheduling algorithm called spatial path scheduling that factors in previously fixed locations - called anchor points - for each placement. This algorithm extends easily to different spatial topologies. We augment this basic algorithm with three heuristics: (1) local and global ALU and network link contention modeling, (2) global critical path estimates, and (3) dependence chain path reservation. We use simulated annealing to explore possible performance improvements and to motivate the augmented heuristics and their weighting functions. We show that the spatial path scheduling algorithm augmented with these three heuristics achieves a 21% average performance improvement over the best prior algorithm and comes within an average of 5% of the annealed performance for our benchmarks.

References

[1]

K. Arvind and R.S. Nikhil. Executing a program on the MIT taggedtoken dataflow architecture. IEEE Transactions on Computers, 39(3):300--318, 1990.

Digital Library

[2]

S.J. Beaty and P.H. Sweany. Instruction scheduling using simulated annealing. In International Conference on Massively Parallel Computing Systems, Colorado Springs, CO, Apr. 1998.

[3]

V. Betz and J. Rose. VPR: A new packing, placement and routing tool for FPGA research. In FPL '97: Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications, pages 213--222, London, UK, 1997. Springer-Verlag.

Digital Library

[4]

D. Burger, S.W. Keckler, K.S. McKinley, M. Dahlin, L.K. John, C. Lin, C.R. Moore, J. Burrill, R.G. McDonald, W. Yoder, and others. Scaling to the end of silicon with EDGE architectures. IEEE Computer, pages 44--55, July 2004.

Digital Library

[5]

J.B. Dennis and D.P. Misunas. A preliminary architecture for a basic data-flow processor. In International Symposium on Computer Architecture, pages 126--132, New York, NY, USA, 1975.

Digital Library

[6]

J.R. Ellis. Bulldog: A Compiler for VLIW Architectures. MIT Press, 1986.

Digital Library

[7]

B. Fields, S. Rubin, and R. Bodik. Focusing processor policies via critical-path prediction. In Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 74--85, July 2001.

Digital Library

[8]

J.A. Fisher, J.R. Ellis, J.C. Ruttenberg, and A. Nicolau. Parallel processing: A smart compiler and a dumb machine. In ACM Symposium on Compiler Construction, Montreal, Canada, June 1984.

Digital Library

[9]

E. Gibert, J. Sanchez, and A. Gonzalez. Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor. In Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pages 123--133, 2002.

Digital Library

[10]

K. Kailas, K. Ebcioglu, and A.K. Agrawala. CARS: A new code generation framework for clustered ILP processors. In International Symposium on High-Performance Computer Architecture, pages 133--143, Jan. 2001.

Digital Library

[11]

C. Kessler and A. Bednarski. Optimal integrated code generation for clustered VLIWarchitectures. In Proceedings of the Joint Conference on Languages, Compilers and Tools for Embedded Systems, pages 102--111, June 2002.

Digital Library

[12]

S. Kirkpatrick, C.D. Gelatt Jr., and M.P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671--680, 1983.

[13]

R.E. Korf. Depth-first iterative-deepening: an optimal admissible tree search. Artif. Intell., 27(1):97--109, 1985.

Digital Library

[14]

W. Lee, D. Puppin, S. Swanson, and S. Amarasinghe. Convergent scheduling. In International Symposium on Microarchitecture, Istanbul, Turkey, Oct. 2002.

Digital Library

[15]

M. Mercaldi, S. Swanson, A. Peterson, A. Putnam, A. Schwerin, M. Oskin, and S. Eggers. Modeling instruction placement on a spatial architecture. In SPAA '06: Proceedings of the Symposium on Parallel Architectures and Applications, 2006.

Digital Library

[16]

J. Moss, P.E. Utgoff, J. Cavazos, D. Precup, D. Stefanovic, C. Brodley, and D. Scheeff. Learning to schedule straight-line code. In Neural Information Processing Systems - Natural and Synthetic, Denver, CO, Dec. 1997.

Digital Library

[17]

R. Nagarajan, D. Burger, K.S. McKinley, C. Lin, S.W. Keckler, and S.K. Kushwaha. Instruction scheduling for emerging communication-exposed architectures. In The International Conference on Parallel Architectures and Compilation Techniques, pages 74--84, Antibes Juan-les-Pins, France, Oct. 2004.

Digital Library

[18]

R. Nagarajan, X. Chen, R.G. McDonald, D. Burger, and S.W. Keckler. Critical path analysis of the TRIPS architecture. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2006.

[19]

E. Ozer, S. Banerjia, and T.M. Conte. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In International Symposium on Microarchitecture, pages 308--315, December 1998.

Digital Library

[20]

P.G. Paulin and J.P. Knight. Force-directed scheduling in automatic data path synthesis. In DAC '87: Proceedings of the 24th ACM/IEEE conference on Design automation, pages 195--202, New York, NY, USA, 1987. ACM Press.

Digital Library

[21]

Y. Qian, S. Carr, and P. Sweany. Optimizing loop performance for clustered VLIW architectures. In The International Conference on Parallel Architectures and Compilation Techniques, pages 271--280, Charlottesville, VA, Sept. 2002.

Digital Library

[22]

A. Smith, J. Burrill, J. Gibson, B. Maher, N. Nethercote, B. Yoder, D. Burger, and K.S. McKinley. Compiling for EDGE architectures. In International Symposium on Code Generation and Optimization, Manhattan, NY, Mar. 2006.

Digital Library

[23]

S. Swanson, K. Michaelson, A. Schwerin, and M. Oskin. WaveScalar. In Proceedings of the 36th Symposium on Microarchitecture, December 2003.

Digital Library

[24]

S. Swanson, K. Michelson, and M. Oskin. Configuration by combustion: Online simulated annealing for dynamic hardware configuration. In ASPLOS X Wild and Crazy Idea Session, 2002.

[25]

E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. IEEE Computer, pages 86--93, Sept. 1997.

Digital Library

[26]

J. Zalamea, J. Llosa, E. Ayguade, and M. Valero. Software and hardware techniques to optimize register file utilization in VLIW architectures. In Proceedings of the International Workshop on Advanced Compiler Technology for High Performance and Embedded Systems (IWACT), July 2001.

Cited By

Feng YLi DTan XYe XFan DLi WWang DZhang HTang Z(2022)Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment MechanismJournal of Computer Science and Technology10.1007/s11390-020-0555-637:4(942-959)Online publication date: 30-Jul-2022
https://doi.org/10.1007/s11390-020-0555-6
Zhao ZSheng WWang QYin WYe PLi JMao Z(2020)Towards Higher Performance and Robust Compilation for CGRA Modulo SchedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.298914931:9(2201-2219)Online publication date: 1-Sep-2020
https://doi.org/10.1109/TPDS.2020.2989149
Feng YXiang TYe XFan DWang DWu DTang Z(2018)Optimizing the Efficiency of Data Transfer in Dataflow Architectures2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2018.00050(140-149)Online publication date: Jun-2018
https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00050
Show More Cited By

Index Terms

A spatial path scheduling algorithm for EDGE architectures
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

A spatial path scheduling algorithm for EDGE architectures
Proceedings of the 2006 ASPLOS Conference

Growing on-chip wire delays are motivating architectural features that expose on-chip communication to the compiler. EDGE architectures are one example of communication-exposed microarchitectures in which the compiler forms dataflow graphs that specify ...
A spatial path scheduling algorithm for EDGE architectures
Proceedings of the 2006 ASPLOS Conference

Growing on-chip wire delays are motivating architectural features that expose on-chip communication to the compiler. EDGE architectures are one example of communication-exposed microarchitectures in which the compiler forms dataflow graphs that specify ...
A spatial path scheduling algorithm for EDGE architectures
Proceedings of the 2006 ASPLOS Conference

Growing on-chip wire delays are motivating architectural features that expose on-chip communication to the compiler. EDGE architectures are one example of communication-exposed microarchitectures in which the compiler forms dataflow graphs that specify ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

October 2006

440 pages

ISBN:1595934510

DOI:10.1145/1168857

General Chair:
John Paul Shen
Intel Corp.
,
Program Chair:
Margaret R. Martonosi
Princeton University

ACM SIGPLAN Notices Volume 41, Issue 11
Proceedings of the 2006 ASPLOS Conference
November 2006
425 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1168918
Issue’s Table of Contents
ACM SIGARCH Computer Architecture News Volume 34, Issue 5
Proceedings of the 2006 ASPLOS Conference
December 2006
425 pages
ISSN:0163-5964
DOI:10.1145/1168919
Issue’s Table of Contents
ACM SIGOPS Operating Systems Review Volume 40, Issue 5
Proceedings of the 2006 ASPLOS Conference
December 2006
425 pages
ISSN:0163-5980
DOI:10.1145/1168917
Issue’s Table of Contents

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ASPLOS06

Sponsor:

ASPLOS06: Architectural Support for Programming Languages and Operating Systems

October 21 - 25, 2006

California, San Jose, USA

Acceptance Rates

ASPLOS XII Paper Acceptance Rate 38 of 158 submissions, 24%;

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

46
Total Citations
View Citations
840
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Feng YLi DTan XYe XFan DLi WWang DZhang HTang Z(2022)Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment MechanismJournal of Computer Science and Technology10.1007/s11390-020-0555-637:4(942-959)Online publication date: 30-Jul-2022
https://doi.org/10.1007/s11390-020-0555-6
Zhao ZSheng WWang QYin WYe PLi JMao Z(2020)Towards Higher Performance and Robust Compilation for CGRA Modulo SchedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.298914931:9(2201-2219)Online publication date: 1-Sep-2020
https://doi.org/10.1109/TPDS.2020.2989149
Feng YXiang TYe XFan DWang DWu DTang Z(2018)Optimizing the Efficiency of Data Transfer in Dataflow Architectures2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2018.00050(140-149)Online publication date: Jun-2018
https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00050
Sutter BRaghavan PLambrechts A(2018)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_12(427-472)Online publication date: 14-Oct-2018
https://doi.org/10.1007/978-3-319-91734-4_12
Zhao ZSheng WHe WMao ZLi Z(2017)A static-placement, dynamic-issue framework for CGRA loop acceleratorProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130697(1348-1353)Online publication date: 27-Mar-2017
https://dl.acm.org/doi/10.5555/3130379.3130697
Zhao ZSheng WHe WMao ZLi Z(2017)A static-placement, dynamic-issue framework for CGRA loop acceleratorDesign, Automation & Test in Europe Conference & Exhibition (DATE), 201710.23919/DATE.2017.7927202(1348-1353)Online publication date: Mar-2017
https://doi.org/10.23919/DATE.2017.7927202
Zhang CYu MYang B(2017)A simple method to solve the network congestion for spitial architctureJournal of Shanghai Jiaotong University (Science)10.1007/s12204-017-1802-z22:1(72-76)Online publication date: 26-Jan-2017
https://doi.org/10.1007/s12204-017-1802-z
Shen XYe XTan XWang DZhang ZFan DTang ZZaks AMendelson BRauchwerger LHwu W(2016)POSTERProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2974054(441-442)Online publication date: 11-Sep-2016
https://dl.acm.org/doi/10.1145/2967938.2974054
Nowatzki TSartin-Tarm MDe Carli LSankaralingam KEstan CRobatmili B(2013)A general constraint-centric scheduling framework for spatial architecturesACM SIGPLAN Notices10.1145/2499370.246216348:6(495-506)Online publication date: 16-Jun-2013
https://dl.acm.org/doi/10.1145/2499370.2462163
Nowatzki TSartin-Tarm MDe Carli LSankaralingam KEstan CRobatmili BBoehm HFlanagan C(2013)A general constraint-centric scheduling framework for spatial architecturesProceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2491956.2462163(495-506)Online publication date: 16-Jun-2013
https://dl.acm.org/doi/10.1145/2491956.2462163
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten