skip to main content
10.1145/2248418.2248424acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture

Published: 12 June 2012 Publication History

Abstract

Worst-Case Execution Time (WCET) is one of the most important metrics in real-time embedded system design. For embedded systems with clustered VLIW architecture, register allocation, instruction scheduling, and cluster assignment are three key activities to pursue code optimization which have profound impact on WCET. At the same time, these three activities exhibit a phase ordering problem: Independently performing register allocation, scheduling and cluster assignment could have a negative effect on the other phases, thereby generating sub-optimal compiled codes. In this paper, a compiler level optimization, namely WCET-aware Re-scheduling Register Allocation (WRRA), is proposed to achieve WCET minimization for real-time embedded systems with clustered VLIW architecture. The novelty of the proposed approach is that the effects of register allocation, instruction scheduling and cluster assignment on the quality of generated code are taken into account for WCET minimization. These three compilation processes are integrated into a single phase to obtain a balanced result. The proposed technique is implemented in Trimaran 4.0. The experimental results show that the proposed technique can reduce WCET effectively, by 33% on average.

References

[1]
H. Falk. "WCET-aware register allocation based on graph coloring," in DAC '09: Proceedings of the 46th annual design automation conference, 2009, pp. 726--731.
[2]
Texas Instruments Inc., "TMS320C62x/67x CPU and instrction set reference guide," 1998.
[3]
T. Liu, M. Li, and C. J. Xue, "Minimizing WCET for real-time embedded systems via static instruction cache locking," in RTAS '09: The fifteenth IEEE real-time and embedded technology and applications symposium, 2009, pp. 35--44.
[4]
"MAP1000 unfolds at equator," in Microprocessor Report, 1998.
[5]
J. Fridman and A. Greefield, "The TigerSharc DSP architecture," in IEEE Micro, 2000, pp. 66--76.
[6]
M. D. Smith, N. Ramsey, and G. Holloway, "A generalized algorithm for graph-coloring register allocation," in PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on programming language design and implementation, 2004, pp. 277--288.
[7]
P. Briggs, "Register allocation via graph coloring," PhD thesis, Rice University, Houston, USA, 1992.
[8]
L. George and A. W. Appel, "Iterated registe coalescing," in ACM transactions on programming language systems, 1996, pp. 300--324.
[9]
P. Lokuciejewski, H. Falk, and P. Marwedel, "WCET-driven cache-based procedure positioning optimizations," in ECRTS '08: Proceedings of euromicro technical committee on real-time systems, 2008 pp. 321--330.
[10]
J.-F. Deverge and I. Puaut, "WCET-directed dynamic scratchpad memory allocation of data," in ECRTS '07: Proceedings of euromicro technical committee on real-time systems, 2007, pp. 179--190.
[11]
I. Puaut and C. Pais, "Scratchpad memories vs locked caches in hard real-time systems: a quantitative comparison," in DATE '07: Proceedings of design, automation and test in Europe, 2007, pp. 1484--1489.
[12]
R. Leupers, "Instruction scheduling for clustered VLIW dsps," in PACT '00: Proceedings of the international conference on parallel architecture and compilation techniques, 2000, pp. 291--300.
[13]
http://www.absint.com/ait
[14]
J. A. Fisher, J. R. Ellis, J. C. Ruttenberg, and A. Nicolau, "Parallel processing: A smart compiler and a dumb machine," in Proceedings 1984 SIG-PLAN symposium on compiler construction, 1984, pp. 37--47.
[15]
Texas Instrucments, Inc., in TMS320C62xx CPU and Instruction Set: Reference Guide, 1997, Manufacturing part #D426008--9761, revision A.
[16]
L. Gwennap, "Digital 21264 sets new standard," in Microprocessor Report, 1996, Vol. 10, pp. 11--16.
[17]
An infrastructure for research in backend compilation and architecture exploration. http://www.trimaran.org
[18]
MiBench. http://www.eecs.umich.edu/mibench
[19]
M. Poletto and V. Sarkar, "Linear scan register allocation," in ACM transactions on programming languages and systems, 1999, pp. 895--913.
[20]
D. W. Goodwin and K. D. Wilken, "Optimal and near-optimal global register allocation using 0--1 interger programming," in Software: practice and experience, 1996, pp. 929--965.
[21]
S. Hack, D. Grund, and G. Goos, "Register allocation for programs in SSA form," in CC '06: International conference on compiler construction, 2006, Vol. 3923, pp. 247--262.
[22]
J. M. Codina, J. Sanchez, and A. Gonzalez, "A unified modulo scheduling and register allocation technique for clustered processors," in PACT '01: Proceedings of the international conference on parallel architecture and compilation techniques, 2001, pp. 175--184.
[23]
G. J. Chaitin, M. A. Auslander, et al., "Register allocation via coloring," in Comouter language, 1981, pp. 47--57.
[24]
J. D. Ullman, "Complexity of sequencing problems," 1976.
[25]
J. R. Ellis, "Bulldog: A compiler for VLIW architectures," in The MIT press, 1986, pp. 180--184.
[26]
S. Jang, S. Carr, P. Sweany, and D. Kuras, "A code generation framwork for VLIW architectures with partitioned register banks," in MPCS '98: Proceedings of 3rd international conference on massively parallel computing systems, 1998, pp. 61--69.
[27]
V. S. Lapinskii and M. F. Jacome, "Cluster assignment for high-performance embedded VLIW processors," in ACM transactions on design automation of electronic systems, 2002, pp. 430--454.
[28]
E. Ozer, S. Banerjia, and T. M. Conte, "Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures," in MICRO, 1998, pp. 308--315.
[29]
R. Nagpal and Y. N. Srikant, "Progmatic integrated scheduling for clustered VLIW architectures," in Software-practice and experience, 2008, pp. 227--257.
[30]
K. Kailars, A. Agrawala, and K. Ebcioglu, "Cars: a new code generation framwork for clustered ILP processors," in HPCA '01: Prodeedings of the 7th international symposium on high-performance computer architecture, 2001, pp. 133--143.
[31]
J. Sanchez and A. Conzalezor, "Instruction scheduling for clustered VLIW architectures," in Proceedings of 13th international symposium on system synthesis, 2000, pp. 41--46.
[32]
J. Zalamea, J. Llosa, E. Ayguade, and M. Valero, "Modulo scheduling with integrated register spilling for clustered VLIW architectures," in MICRO, 2001, pp. 160--169.
[33]
F. Pereira and J. Palsberg, "Register allocation by puzzle solving," in PLDI '08: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, 2008, pp. 216--226.
[34]
P. Faraboschi, G. Brown, J. Fisher, G. Desoli, and F. Homewood, "Lx: a technology platform for customizable VLIW embedded processing," in ISCA '00: Proceedings of the 27th international symposium on comupter architecture, 2000, pp. 203--213.
[35]
T. Liu, M. Li, and C. J. Xue, "Minimizing WCET for real-time embedded systems via static instruction cache locking," in RTAS '09: Real-time and embedded technology and applications symposium, 2009, pp. 35--44.
[36]
T. Liu, Y. Zhao, M. Li, and C. J. Xue, "Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC," in Journal of parallel distributed computing, 2011, Vol. 71, pp. 1473--1483.
[37]
T. Liu, Y. Zhao, M. Li, and C. J. Xue, "Task assignment with cache partitioning and locking for WCET minimization on MPSoC," in ICPP '10: International conference on parallel processing, 2010, pp. 573--582.
[38]
C. Q. Xu, C. J. Xue, Y. He, and E. H.-M. Sha, "Energy efficient joint scheduling and multi-core interconnect design," in ASP-DAC '10: Asia and south pacific design automation conference, 2010, pp. 879--884.
[39]
C. Q. Xu, C. J. Xue, J. Hu, and E. H.-M. Sha, "Optimizing scheduling and intercluster connection for application-specific DSP processors," in IEEE transactions on signal processing, 2009, Vol. 57, pp. 4538--4547.
[40]
C. Q. Xu, C. J. Xue, and E. H.-M. Sha, "Energy-efficient joint scheduling and application-specific interconnection design," in IEEE transactions on very large scale integration systems, 2011, Vol. 19, pp. 1813--1822.
[41]
T. Liu, A. Orailoglu, and C. J. Xue, "Register allocation for simultaneous reduction of energy and peak temperature on registers," in DATE '11: Proceedings of design, automation and test in Europe, 2011, pp. 20--25.

Cited By

View all
  • (2017)On Improving Performance and Energy Efficiency for Register-File Connected Clustered VLIW Architectures for Embedded System UsageThe Computer Journal10.1093/comjnl/bxx001Online publication date: 22-Jan-2017
  • (2016)Large Page Address Mapping in Massive Parallel Processor Systems2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS.2016.0118(877-883)Online publication date: Dec-2016
  • (2015)Joint WCET and Update Activity Minimization for Cyber-Physical SystemsACM Transactions on Embedded Computing Systems10.1145/268053914:1(1-21)Online publication date: 21-Jan-2015
  • Show More Cited By

Index Terms

  1. WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    LCTES '12: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
    June 2012
    153 pages
    ISBN:9781450312127
    DOI:10.1145/2248418
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 47, Issue 5
      LCTES '12
      MAY 2012
      152 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2345141
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 June 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. WCET
    2. cluster assignment
    3. instruction scheduling
    4. register allocation

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    LCTES '12

    Acceptance Rates

    Overall Acceptance Rate 116 of 438 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)On Improving Performance and Energy Efficiency for Register-File Connected Clustered VLIW Architectures for Embedded System UsageThe Computer Journal10.1093/comjnl/bxx001Online publication date: 22-Jan-2017
    • (2016)Large Page Address Mapping in Massive Parallel Processor Systems2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS.2016.0118(877-883)Online publication date: Dec-2016
    • (2015)Joint WCET and Update Activity Minimization for Cyber-Physical SystemsACM Transactions on Embedded Computing Systems10.1145/268053914:1(1-21)Online publication date: 21-Jan-2015
    • (2014)Effort at Constructing Big Data Sensor Networks for Monitoring Greenhouse Gas EmissionInternational Journal of Distributed Sensor Networks10.1155/2014/61960810:7(619608)Online publication date: Jan-2014
    • (2014)WCET-aware scheduling optimizations for multi-core real-time systems2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV)10.1109/SAMOS.2014.6893196(67-74)Online publication date: Jul-2014
    • (2014)Loop Transforming for Reducing Data Alignment on Multi-Core SIMD ProcessorsJournal of Signal Processing Systems10.1007/s11265-013-0754-274:2(137-150)Online publication date: 1-Feb-2014
    • (2013)Optimizing Instruction Scheduling and Register Allocation for Register‐File‐Connected Clustered VLIW ArchitecturesThe Scientific World Journal10.1155/2013/9130382013:1Online publication date: 18-Jul-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media