skip to main content
10.1145/1811212.1811215acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscopesConference Proceedingsconference-collections
research-article

B2P2: bounds based procedure placement for instruction TLB power reduction in embedded systems

Published: 28 June 2010 Publication History

Abstract

High performance embedded processors are equipped with the Translation Look-aside Buffer (TLB) which forms the key ingredient to efficient and speedy virtual memory management. The TLB though small, is frequently accessed, and therefore not only consumes significant energy, but also is one of the important thermal hot-spots in the processor. Among the many circuit and microarchitectural techniques proposed to reduce TLB power consumption, the Use-Last TLB is one very efficient technique in which power is consumed only when different pages are accessed in succession, i.e., when there is a page-switch [26]. Though the Use-Last technique is effective in reducing i-TLB power, there is scope to further improve its effectiveness by changing the relative code placement of the program. In this work, we formulate the code placement problem to minimize the page-switches in a program. We prove that this problem is NP-complete and propose an efficient Bounds Based Procedure Placement (B2P2) heuristic to efficiently reduce the program's page-switches. Our procedure placement technique delivers an average of 76% reduction in the instrucion-TLB power with negligible (< 2%) impact on performance, over and above the reduction achieved by the Use-Last TLB architecture alone.

References

[1]
ARM. ARMv5 Architecture Reference Manual, 2007.
[2]
D. F. Bacon, J.-H. Chow, D.-c. R. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and tlb effectiveness. In CASCON '94: Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research, page 3. IBM Press, 1994.
[3]
D. Burger and T. M. Austin. The simplescalar tool set, version 2.0. SIGARCH Comput. Archit. News, 25(3):13--25, 1997.
[4]
S. Carr, K. S. McKinley, and C.-W. Tseng. Compiler optimizations for improving data locality. SIGOPS Oper. Syst. Rev., 28(5):252--262, 1994.
[5]
M. Cekleov and M. Dubois. Virtual-address caches part 1: Problems and solutions in uniprocessors. IEEE Micro, 17(5):64--71, 1997.
[6]
Y.-J. Chang. An ultra low-power tlb design. In DATE '06: Proceedings of the conference on Design, automation and test in Europe, pages 1122--1127, 3001 Leuven, Belgium, Belgium, 2006. European Design and Automation Association.
[7]
A. Chiyonobu and T. Sato. Energy-efficient instruction scheduling utilizing cache miss information. In MEDEA '05: Proceedings of the 2005 workshop on MEmory performance, pages 65--70, Washington, DC, USA, 2005. IEEE Computer Society.
[8]
J.-H. Choi, J.-H. Lee, S.-W. Jeong, S.-D. Kim, and C. Weems. A low power tlb structure for embedded systems. Computer Architecture Letters, 1(1):3--3, january-december 2002.
[9]
L. T. Clark, B. Choi, and M. Wilkerson. Reducing translation lookaside buffer active power. In ISLPED '03, pages 10--13, New York, NY, USA, 2003. ACM Press.
[10]
V. Delaluz, M. Kandemir, N. Vijaykrishnan, M. Irwin, A. Sivasubramaniam, and I. Kolcu. Compiler-directed array interleaving for reducing energy in multi-bank memories. In ASP-DAC '02, pages 288--293, 2002.
[11]
B. Egger, J. Lee, and H. Shin. Dynamic scratchpad memory management for code in portable systems with an mmu. ACM Trans. Embed. Comput. Syst., 7(2):1--38, 2008.
[12]
M. Ekman, P. Stenström, and F. Dahlgren. Tlb and snoop energy-reduction using virtual caches in low-power chip-multiprocessors. In ISLPED '02: Proceedings of the 2002 international symposium on Low power electronics and design, pages 243--246, New York, NY, USA, 2002. ACM.
[13]
M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman &amp; Co., New York, NY, USA, 1979.
[14]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In WWC '01: Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on, pages 3--14, Washington, DC, USA, 2001. IEEE Computer Society.
[15]
C. X. Huang, B. Zhang, A.-C. Deng, and B. Swirski. The design and implementation of powermill. In ISLPED '95: Proceedings of the 1995 international symposium on Low power design, pages 105--110, New York, NY, USA, 1995. ACM.
[16]
X. Huang, B. T. Lewis, and K. S. McKinley. Dynamic code management: Improving whole program code locality in managed runtimes. In VEE '06: Proceedings of the 2nd international conference on Virtual execution environments, pages 133--143, New York, NY, USA, 2006. ACM.
[17]
Intel Corporation. Intel XScale®Technology Overview, 2000.
[18]
R. Jeyapaul, S. Marathe, and A. Shrivastava. Code transformations for tlb power reduction. In VLSID '09: Proceedings of the 2009 22nd International Conference on VLSI Design, pages 413--418, Washington, DC, USA, 2009. IEEE Computer Society.
[19]
I. Kadayif, P. Nath, M. Kandemir, and A. Sivasubramaniam. Compiler-directed physical address generation for reducing dtlb power. In ISPASS '04, pages 161--168, Washington, DC, USA, 2004. IEEE Computer Society.
[20]
I. Kadayif, A. Sivasubramaniam, M. Kandemir, G. Kandiraju, and G. Chen. Optimizing instruction tlb energy using software and hardware techniques. ACM Trans. Des. Autom. Electron. Syst., 10(2):229--257, 2005.
[21]
M. Kandemir, I. Kadayif, and G. Chen. Compiler-directed code restructuring for reducing data tlb energy. In CODES+ISSS '04, pages 98--103, Washington, DC, USA, 2004. IEEE Computer Society.
[22]
J.-H. Lee, G.-H. Park, S.-B. Park, and S.-D. Kim. A selective filter-bank tlb system. In ISLPED '03: Proceedings of the 2003 international symposium on Low power electronics and design, pages 312--317, New York, NY, USA, 2003. ACM.
[23]
S. Manne, A. Klauser, D. Grunwald, and F. Somenzi. Low power tlb design for high performance microprocessors, 1997.
[24]
MIPS Technologies. MIPS32®Architecture, 2003.
[25]
A. Parikh, S. Kim, M. Kandemir, N. Vijaykrishnan, and M. Irwin. Instruction scheduling for low power. In VLSI-SP '04, pages 129--149, 2004.
[26]
J. R. Haigh, M. Wilkerson, J. Miller, T. Beatty, S. Strazdus, and L. Clark. A low-power 2.5 ghz 90 nm level 1 cache and memory management unit. In IEEE Journal of Solid State Circuits, pages 1190--1199. IEEE Press, 2004.
[27]
H. Tomiyama and H. Yasuura. Code placement techniques for cache miss rate reduction. ACM Trans. Des. Autom. Electron. Syst., 2(4):410--429, 1997.

Cited By

View all
  • (2016)A survey of techniques for architecting TLBsConcurrency and Computation: Practice and Experience10.1002/cpe.406129:10Online publication date: 22-Dec-2016

Index Terms

  1. B2P2: bounds based procedure placement for instruction TLB power reduction in embedded systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SCOPES '10: Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
    June 2010
    91 pages
    ISBN:9781450300841
    DOI:10.1145/1811212
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • EDAA: European Design Automation Association

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. code placement
    2. compiler technique
    3. embedded processors
    4. instruction TLB
    5. memory management
    6. power

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SCOPES '10
    Sponsor:
    • EDAA

    Acceptance Rates

    Overall Acceptance Rate 38 of 79 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)A survey of techniques for architecting TLBsConcurrency and Computation: Practice and Experience10.1002/cpe.406129:10Online publication date: 22-Dec-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media