research-article

WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture

Authors:

Chun Jason XueAuthors Info & Claims

LCTES '12: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems

Pages 31 - 40

https://doi.org/10.1145/2248418.2248424

Published: 12 June 2012 Publication History

Abstract

Worst-Case Execution Time (WCET) is one of the most important metrics in real-time embedded system design. For embedded systems with clustered VLIW architecture, register allocation, instruction scheduling, and cluster assignment are three key activities to pursue code optimization which have profound impact on WCET. At the same time, these three activities exhibit a phase ordering problem: Independently performing register allocation, scheduling and cluster assignment could have a negative effect on the other phases, thereby generating sub-optimal compiled codes. In this paper, a compiler level optimization, namely WCET-aware Re-scheduling Register Allocation (WRRA), is proposed to achieve WCET minimization for real-time embedded systems with clustered VLIW architecture. The novelty of the proposed approach is that the effects of register allocation, instruction scheduling and cluster assignment on the quality of generated code are taken into account for WCET minimization. These three compilation processes are integrated into a single phase to obtain a balanced result. The proposed technique is implemented in Trimaran 4.0. The experimental results show that the proposed technique can reduce WCET effectively, by 33% on average.

References

[1]

H. Falk. "WCET-aware register allocation based on graph coloring," in DAC '09: Proceedings of the 46th annual design automation conference, 2009, pp. 726--731.

Digital Library

[2]

Texas Instruments Inc., "TMS320C62x/67x CPU and instrction set reference guide," 1998.

[3]

T. Liu, M. Li, and C. J. Xue, "Minimizing WCET for real-time embedded systems via static instruction cache locking," in RTAS '09: The fifteenth IEEE real-time and embedded technology and applications symposium, 2009, pp. 35--44.

Digital Library

[4]

"MAP1000 unfolds at equator," in Microprocessor Report, 1998.

[5]

J. Fridman and A. Greefield, "The TigerSharc DSP architecture," in IEEE Micro, 2000, pp. 66--76.

Digital Library

[6]

M. D. Smith, N. Ramsey, and G. Holloway, "A generalized algorithm for graph-coloring register allocation," in PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on programming language design and implementation, 2004, pp. 277--288.

Digital Library

[7]

P. Briggs, "Register allocation via graph coloring," PhD thesis, Rice University, Houston, USA, 1992.

Digital Library

[8]

L. George and A. W. Appel, "Iterated registe coalescing," in ACM transactions on programming language systems, 1996, pp. 300--324.

Digital Library

[9]

P. Lokuciejewski, H. Falk, and P. Marwedel, "WCET-driven cache-based procedure positioning optimizations," in ECRTS '08: Proceedings of euromicro technical committee on real-time systems, 2008 pp. 321--330.

Digital Library

[10]

J.-F. Deverge and I. Puaut, "WCET-directed dynamic scratchpad memory allocation of data," in ECRTS '07: Proceedings of euromicro technical committee on real-time systems, 2007, pp. 179--190.

Digital Library

[11]

I. Puaut and C. Pais, "Scratchpad memories vs locked caches in hard real-time systems: a quantitative comparison," in DATE '07: Proceedings of design, automation and test in Europe, 2007, pp. 1484--1489.

Digital Library

[12]

R. Leupers, "Instruction scheduling for clustered VLIW dsps," in PACT '00: Proceedings of the international conference on parallel architecture and compilation techniques, 2000, pp. 291--300.

Digital Library

[13]

http://www.absint.com/ait

[14]

J. A. Fisher, J. R. Ellis, J. C. Ruttenberg, and A. Nicolau, "Parallel processing: A smart compiler and a dumb machine," in Proceedings 1984 SIG-PLAN symposium on compiler construction, 1984, pp. 37--47.

Digital Library

[15]

Texas Instrucments, Inc., in TMS320C62xx CPU and Instruction Set: Reference Guide, 1997, Manufacturing part #D426008--9761, revision A.

[16]

L. Gwennap, "Digital 21264 sets new standard," in Microprocessor Report, 1996, Vol. 10, pp. 11--16.

[17]

An infrastructure for research in backend compilation and architecture exploration. http://www.trimaran.org

[18]

MiBench. http://www.eecs.umich.edu/mibench

[19]

M. Poletto and V. Sarkar, "Linear scan register allocation," in ACM transactions on programming languages and systems, 1999, pp. 895--913.

Digital Library

[20]

D. W. Goodwin and K. D. Wilken, "Optimal and near-optimal global register allocation using 0--1 interger programming," in Software: practice and experience, 1996, pp. 929--965.

Digital Library

[21]

S. Hack, D. Grund, and G. Goos, "Register allocation for programs in SSA form," in CC '06: International conference on compiler construction, 2006, Vol. 3923, pp. 247--262.

Digital Library

[22]

J. M. Codina, J. Sanchez, and A. Gonzalez, "A unified modulo scheduling and register allocation technique for clustered processors," in PACT '01: Proceedings of the international conference on parallel architecture and compilation techniques, 2001, pp. 175--184.

Digital Library

[23]

G. J. Chaitin, M. A. Auslander, et al., "Register allocation via coloring," in Comouter language, 1981, pp. 47--57.

Digital Library

[24]

J. D. Ullman, "Complexity of sequencing problems," 1976.

[25]

J. R. Ellis, "Bulldog: A compiler for VLIW architectures," in The MIT press, 1986, pp. 180--184.

Digital Library

[26]

S. Jang, S. Carr, P. Sweany, and D. Kuras, "A code generation framwork for VLIW architectures with partitioned register banks," in MPCS '98: Proceedings of 3rd international conference on massively parallel computing systems, 1998, pp. 61--69.

[27]

V. S. Lapinskii and M. F. Jacome, "Cluster assignment for high-performance embedded VLIW processors," in ACM transactions on design automation of electronic systems, 2002, pp. 430--454.

Digital Library

[28]

E. Ozer, S. Banerjia, and T. M. Conte, "Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures," in MICRO, 1998, pp. 308--315.

Digital Library

[29]

R. Nagpal and Y. N. Srikant, "Progmatic integrated scheduling for clustered VLIW architectures," in Software-practice and experience, 2008, pp. 227--257.

Digital Library

[30]

K. Kailars, A. Agrawala, and K. Ebcioglu, "Cars: a new code generation framwork for clustered ILP processors," in HPCA '01: Prodeedings of the 7th international symposium on high-performance computer architecture, 2001, pp. 133--143.

Digital Library

[31]

J. Sanchez and A. Conzalezor, "Instruction scheduling for clustered VLIW architectures," in Proceedings of 13th international symposium on system synthesis, 2000, pp. 41--46.

Digital Library

[32]

J. Zalamea, J. Llosa, E. Ayguade, and M. Valero, "Modulo scheduling with integrated register spilling for clustered VLIW architectures," in MICRO, 2001, pp. 160--169.

Digital Library

[33]

F. Pereira and J. Palsberg, "Register allocation by puzzle solving," in PLDI '08: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, 2008, pp. 216--226.

Digital Library

[34]

P. Faraboschi, G. Brown, J. Fisher, G. Desoli, and F. Homewood, "Lx: a technology platform for customizable VLIW embedded processing," in ISCA '00: Proceedings of the 27th international symposium on comupter architecture, 2000, pp. 203--213.

Digital Library

[35]

T. Liu, M. Li, and C. J. Xue, "Minimizing WCET for real-time embedded systems via static instruction cache locking," in RTAS '09: Real-time and embedded technology and applications symposium, 2009, pp. 35--44.

Digital Library

[36]

T. Liu, Y. Zhao, M. Li, and C. J. Xue, "Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC," in Journal of parallel distributed computing, 2011, Vol. 71, pp. 1473--1483.

Digital Library

[37]

T. Liu, Y. Zhao, M. Li, and C. J. Xue, "Task assignment with cache partitioning and locking for WCET minimization on MPSoC," in ICPP '10: International conference on parallel processing, 2010, pp. 573--582.

Digital Library

[38]

C. Q. Xu, C. J. Xue, Y. He, and E. H.-M. Sha, "Energy efficient joint scheduling and multi-core interconnect design," in ASP-DAC '10: Asia and south pacific design automation conference, 2010, pp. 879--884.

Digital Library

[39]

C. Q. Xu, C. J. Xue, J. Hu, and E. H.-M. Sha, "Optimizing scheduling and intercluster connection for application-specific DSP processors," in IEEE transactions on signal processing, 2009, Vol. 57, pp. 4538--4547.

Digital Library

[40]

C. Q. Xu, C. J. Xue, and E. H.-M. Sha, "Energy-efficient joint scheduling and application-specific interconnection design," in IEEE transactions on very large scale integration systems, 2011, Vol. 19, pp. 1813--1822.

Digital Library

[41]

T. Liu, A. Orailoglu, and C. J. Xue, "Register allocation for simultaneous reduction of energy and peak temperature on registers," in DATE '11: Proceedings of design, automation and test in Europe, 2011, pp. 20--25.

Cited By

He HYang XZhang Y(2017)On Improving Performance and Energy Efficiency for Register-File Connected Clustered VLIW Architectures for Embedded System UsageThe Computer Journal10.1093/comjnl/bxx001Online publication date: 22-Jan-2017
https://doi.org/10.1093/comjnl/bxx001
Sun YYi XLiu H(2016)Large Page Address Mapping in Massive Parallel Processor Systems2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS.2016.0118(877-883)Online publication date: Dec-2016
https://doi.org/10.1109/ICPADS.2016.0118
Huang YZhao MXue C(2015)Joint WCET and Update Activity Minimization for Cyber-Physical SystemsACM Transactions on Embedded Computing Systems10.1145/268053914:1(1-21)Online publication date: 21-Jan-2015
https://dl.acm.org/doi/10.1145/2680539
Show More Cited By

Index Terms

WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture
LCTES '12

Worst-Case Execution Time (WCET) is one of the most important metrics in real-time embedded system design. For embedded systems with clustered VLIW architecture, register allocation, instruction scheduling, and cluster assignment are three key ...
An Efficient WCET-Aware Instruction Scheduling and Register Allocation Approach for Clustered VLIW Processors
Special Issue ESWEEK 2017, CASES 2017, CODES + ISSS 2017 and EMSOFT 2017

In real-time embedded system design, one major goal is to construct a feasible schedule. Whether a feasible schedule exists depends on the Worst-Case Execution Time (WCET) of each task. Consequently, it is important to minimize the WCET of each task. We ...
WCET-Aware Re-Scheduling Register Allocation for Real-Time Embedded Systems With Clustered VLIW Architecture

Worst-case execution time (WCET) is one of the most important metric in real-time embedded system design. For embedded systems with clustered very long instruction word (VLIW) architecture, register allocation, instruction scheduling, and cluster ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LCTES '12: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems

June 2012

153 pages

ISBN:9781450312127

DOI:10.1145/2248418

General Chair:
Reinhard Wilhelm
Saarland University Saarbrücken
,
Program Chairs:
Heiko Falk
Ulm University
,
Wang Yi
Uppsala University

ACM SIGPLAN Notices Volume 47, Issue 5
LCTES '12
MAY 2012
152 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2345141
Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Research Grants Council, University Grants Committee, Hong Kong

Conference

LCTES '12

Sponsor:

LCTES '12: SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2012

June 12 - 13, 2012

Beijing, China

Acceptance Rates

Overall Acceptance Rate 116 of 438 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
242
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

He HYang XZhang Y(2017)On Improving Performance and Energy Efficiency for Register-File Connected Clustered VLIW Architectures for Embedded System UsageThe Computer Journal10.1093/comjnl/bxx001Online publication date: 22-Jan-2017
https://doi.org/10.1093/comjnl/bxx001
Sun YYi XLiu H(2016)Large Page Address Mapping in Massive Parallel Processor Systems2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS.2016.0118(877-883)Online publication date: Dec-2016
https://doi.org/10.1109/ICPADS.2016.0118
Huang YZhao MXue C(2015)Joint WCET and Update Activity Minimization for Cyber-Physical SystemsACM Transactions on Embedded Computing Systems10.1145/268053914:1(1-21)Online publication date: 21-Jan-2015
https://dl.acm.org/doi/10.1145/2680539
Tang HYang XZhang Y(2014)Effort at Constructing Big Data Sensor Networks for Monitoring Greenhouse Gas EmissionInternational Journal of Distributed Sensor Networks10.1155/2014/61960810:7(619608)Online publication date: Jan-2014
https://doi.org/10.1155/2014/619608
Kelter TBorghorst HMarwedel P(2014)WCET-aware scheduling optimizations for multi-core real-time systems2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV)10.1109/SAMOS.2014.6893196(67-74)Online publication date: Jul-2014
https://doi.org/10.1109/SAMOS.2014.6893196
Wang YPan LShao ZGuan YGuo M(2014)Loop Transforming for Reducing Data Alignment on Multi-Core SIMD ProcessorsJournal of Signal Processing Systems10.1007/s11265-013-0754-274:2(137-150)Online publication date: 1-Feb-2014
https://dl.acm.org/doi/10.1007/s11265-013-0754-2
Tang HYang XWang SZhang Y(2013)Optimizing Instruction Scheduling and Register Allocation for Register‐File‐Connected Clustered VLIW ArchitecturesThe Scientific World Journal10.1155/2013/9130382013:1Online publication date: 18-Jul-2013
https://doi.org/10.1155/2013/913038

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents