skip to main content
article

Methodology for operation shuffling and L0 cluster generation for low energy heterogeneous VLIW processors

Published: 01 September 2007 Publication History

Abstract

Clustering L0 buffers is effective for energy reduction in the instruction memory hierarchy of embedded VLIW processors. However, the efficiency of the clustering depends on the schedule of the target application. Especially in heterogeneous or data clustered VLIW processors, determining energy efficient scheduling is more constraining.
This article proposes a realistic technique supported by a tool flow to explore operation shuffling for improving generation of L0 clusters. The tool flow explores assignment of operations for each cycle and generates various schedules. This approach makes it possible to reduce energy consumption for various processor architectures. However, the computational complexity is large because of the huge exploration space. Therefore, some heuristics are also developed, which reduce the size of the exploration space while the solution quality remains reasonable. Furthermore, we also propose a technique to support VLIW processors with multiple data clusters, which is essential to apply the methodology to real world processors.
The experimental results indicate potential gains of up to 27.6% in energy in L0 buffers, through operation shuffling for heterogeneous processor architectures as well as a homogeneous architecture. Furthermore, the proposed heuristics drastically reduce the exploration search space by about 90%, while the results are comparable to full search, with average differences of less than 1%. The experimental results indicate that energy efficiency can be improved in most of the media benchmarks by the proposed methodology, where the average gain is around 10% in comparison with generating clusters without operation shuffling.

References

[1]
Bajwa, R. S., Hiraki, M., Kojima, H., Gorny, D. J., Nitta, K., Shridhar, A., Seki, K., and Sasaki, K. 1997. Instruction buffering to reduce power in processors for signal processing. IEEE Trans. VLSI Syst. 5, 4 (Dec.), 417--424.
[2]
Benini, L., Bruni, D., Chinosi, M., Silvano, C., Zaccaria, V., and Zafalon, R. 2001. A power modeling and estimation framework for VLIW-based embedded systems. In Proceedings of the IEEE International Workshop on Power And Timing Modeling, Optimization and Simulation, Yverdon-Les-Bains, IEEE. Switzerland.
[3]
Bona, A., Sami, M., Sciuto, D., Silvano, C., Zaccaria, V., and Zafalon, R. 2002a. Energy estimation and optimization of embedded VLIW processors based on instruction clustering. In Design Automation Conference. New Orleans, LO. 886--891.
[4]
Bona, A., Sami, M., Sciuto, D., Silvano, C., Zaccaria, V., and Zafalon, R. 2002b. An instruction-level methodology for power estimation and optimization of embedded VLIW cores. In Proceedings of the Design, Automation and Test in Europe. Paris, France, 1128.
[5]
Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the International Symposium on Computer Architecture. Vancouver, BC, 83--94.
[6]
Clear Speed. http://www.clearspeed.com/.
[7]
de Beeck, P. O., Barat, F., Jayapala, M., and Lauwereins, R. 2001. CRISP: A template for reconfigurable instruction set processors. In Proceedings of the International Conference on Field Programmable Logic and Applications. Belfast, Ireland, 296--305.
[8]
Faraboschi, P., Brown, G., Fisher, J. A., Desoli, G., and Homewood, F. 2000. Lx: A technology platform for customizable VLIW embedded processing. In Proceedings of the International Symposium on Computer Architecture. Vancouver, Canada, 203--213.
[9]
Gangwar, A., Balakrishnan, M., Panda, P. R., and Kumar, A. 2005. Evaluation of bus based interconnect mechanisms in clustered VLIW architectures. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. Munich, Germany, 730--735.
[10]
Gordon-Ross, A. and Vahid, F. 2005. Frequent loop detection using efficient nonintrusive on-chip hardware. IEEE Trans. Comput. 54, 10 (Oct.), 1203--1215.
[11]
Jacome, M. F. and de Veciana, G. 2000. Design challenges for new application-specific processors. IEEE Design & Test Comput. 17, 2, 40--50.
[12]
Jayapala, M., Barat, F., Vander Aa, T., Catthoor, F., Corporaal, H., and Deconinck, G. 2005. Clustered loop buffer organization for low energy VLIW embedded processors. IEEE Trans. Comput. 54, 6 (June), 672--683.
[13]
Jayapala, M., Vander Aa, T., Barat, F., Catthoor, F., Coporaal, H., and Deconinck, G. 2004. L0 cluster synthesis and operation shuffling. In Proceedings of the IEEE International Workshop on Power And Timing Modeling, Optimization and Simulation. Santorini, Greece. IEEE, 311--321.
[14]
Lambrechts, A., Raghavan, P., Leroy, A., Talavera, G., VanderAa, T., Jayapala, M., Catthoor, F., Verkest, D., Deconinck, G., Coporaal, H., Robert, F., and Carrabina, J. 2005. Power breakdown analysis for a heterogeneous NoC platform running a video application. In Proceedings of the IEEE 16th International Conference on Application-Specific Systems, Architectures and Processors. Samos, Greece, 179--184.
[15]
Lambrechts, A., Vander Aa, T., Jayapala, M., Talavera, G., Leroy, A., Shickova, A., Barat, F., Mei, B., Catthoor, F., Verkest, D., Deconinck, G., Corporaal, H., Robert, F., and Bordoll, J. C. 2004. Design style case study for embedded multimedia compute nodes. In Proceedings of the Real-Time Systems Symposium. 104--113.
[16]
Lee, L. H., Moyer, B., and Arends, J. 1999. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In Proceedings of the International Symposium on Low Power Electronic Design. San Diego, CA, 267--269.
[17]
MediaBench. http://cares.icsl.ucla.edu/MediaBench/.
[18]
Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., and Owens, J. D. 2000. Register organization for media processing. In Proceedings of the International Symposium on High-Performance Computer Architecture. Toulouse, France, 375--386.
[19]
Scarpazza, D. P., Raghavan, P., Novo, D., Catthoor, F., and Verkest, D. 2006. Software simultaneous multi-threading, a technique to exploit task-level parallelism to improve instruction- and data-level parallelism. In Proceedings of the Power and Timing Modeling, Optimization and Simulation. Montpellier, France, Springer Verlag, 12--23.
[20]
Silicon Hive. http://www.silicon-hive.com/.
[21]
Suresh, D. C., Najjar, W. A., Vahid, F., Villarreal, J. R., and Stitt, G. 2003. Profiling tools for hardware/software partitioning of embedded applications. In Proceedings of the Language, Compiler and Tool Support for Embedded Systems. San Diego, CA. 189--198.
[22]
Texas Instruments. 2000. TMS320C6000 CPU and Instruction Set Reference Guide.
[23]
Trimaran. Trimaran: An infrastructure for research in instruction-level parallelism. http://www.trimaran.org/.
[24]
Vander Aa, T., Jayapala, M., Barat, F., Deconinck, G., Lauwereins, R., Catthoor, F., and Coporaal, H. 2004. Instruction buffering exploration for low energy VLIW with instruction clusters. In Proceedings of the IEEE Asia and South Pacific Design Automation Conference. Yokohama, Japan, IEEE, 825--830.

Cited By

View all
  • (2010)Overall Framework for ExplorationUltra-Low Energy Domain-Specific Instruction-Set Processors10.1007/978-90-481-9528-2_4(83-113)Online publication date: 3-Jul-2010

Index Terms

  1. Methodology for operation shuffling and L0 cluster generation for low energy heterogeneous VLIW processors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Design Automation of Electronic Systems
    ACM Transactions on Design Automation of Electronic Systems  Volume 12, Issue 4
    September 2007
    449 pages
    ISSN:1084-4309
    EISSN:1557-7309
    DOI:10.1145/1278349
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 01 September 2007
    Published in TODAES Volume 12, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Compilers for low energy
    2. VLIW processors
    3. loop buffers

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2010)Overall Framework for ExplorationUltra-Low Energy Domain-Specific Instruction-Set Processors10.1007/978-90-481-9528-2_4(83-113)Online publication date: 3-Jul-2010

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media