skip to main content
research-article

Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors

Published:06 February 2008Publication History
Skip Abstract Section

Abstract

Maintaining local caches coherently in shared-memory multiprocessors results in significant power consumption. The customization methodology we propose exploits the fact that in embedded systems, important knowledge is available to the system designers regarding memory sharing between tasks. We demonstrate how the snoop-induced cache probings can be significantly reduced by identifying and exploiting in a deterministic way the shared memory regions between the processors. Snoop activity is enabled only for the accesses referring to known shared regions. The hardware support is not only cost efficient, but also software programmable, which allows for reprogrammability and customization across different tasks and applications.

References

  1. Barroso, L., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., and Verghese, B. 2000. Piranha: A scalable architecture based on single-chip multiprocessing. In Proceedings of the International Symposium on Computer Architecture (ISCA). ACM Press, New York, 282--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bashirullah, R., Liu, W., and Cavin, R. K. 2003. Low-Power design methodology for an on-chip bus with adaptive bandwidth capability. In Proceedings of the Design Automation Conference (DAC). ACM Press, New York, 628--633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Berndl, M., Lhotak, O., Qian, F., Hendren, L., and Umanee, N. 2003. Points-To analysis using BDDS. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI). 103--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Binkert, N., Dreslinski, R., Hsu, L., Lim, K., Saidi, A., and Reinhardt, S. 2006. The m5 simulator: Modeling networked systems. IEEE Micro. 26, 4, 52--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cantin, J. F., Lipasti, M. H., and Smith, J. E. 2005. Improving multiprocessor performance with coarse-grain coherence tracking. SIGARCH Comput. Archit. News 33, 2, 246--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cekleov, M. and Dubois, M. 1997. Virtual-address caches. Part 1: Problems and solutions in uniprocessors. IEEE Micro. 17, 5 (Sept.), 64--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cumming, P. 2003. The TI OMAP platform approach to SoC. In Winning the SOC Revolution. Kluwer Academic.Google ScholarGoogle Scholar
  8. Das, M. 2000. Unification-Based pointer analysis with directional assignments. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI), 35--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ekman, M., Dahlgren, F., and Stenstrom, P. 2002. TLB and snoop energy-reduction using virtual caches in low-power chip-microprocessors. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED), 243--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Furber, S. B. 2000. ARM System-on-Chip Architecture. Addison-Wesley, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gonzalez, R. E. 2000. Xtensa: A configurable and extensible processor. IEEE Micro. 20, 2, 60--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hind, M. 2001. Pointer analysis: Haven't we solved this problem yet? In ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Intel Corporation. 2007. Intel XScale Microarchitecture. http://www.intel.com/design/intelxscale/316283.htm.Google ScholarGoogle Scholar
  14. Kathail, V., Aditya, S., Schreiber, R., Rau, B. R., Cronquist, D. C., and Sivaraman, M. 2002. Pico: Automatically designing custom computers. IEEE Comput. 35, 9, 39--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Landi, W. 1992. Undecidability of static analysis. ACM Lett. Program. Lang. Syst. 1, 4 (Dec.), 323--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lenoski, D., Laudon, J., Gharachorloo, K., Gupta, A., and Hennessy, J. 1990. The directory-based cache-coherence protocol for the dash multiprocessor. In Proceedings of the International Symposium on Computer Architecture (ISCA). ACM Press, New York, 148--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Li, M.-L., Sasanka, R., Adve, S., Chen, Y.-K., and Debes, E. 2005. The ALPbench benchmark suite for complex multimedia applications. In Proceedings of the International Symposium on Workload Characterization, 34--45.Google ScholarGoogle Scholar
  18. Loghi, M., Letis, M., Benini, L., and Poncino, M. 2005. Exploring the energy efficiency of cache-coherence protocols in single-chip multi-processors. In Proceedings of the 15th Great Lakes Symposium on VLSI (GLSVLSI), 276--281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lyonnard, D., Yoo, S., Baghdadi, A., and Jerraya, A. 2001. Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip. In Proceedings of the Design Automation Conference (DAC). ACM Press, New York, 518--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Martin, M. K., Hill, M. D., and Wood, D. A. 2003. Token coherence: Decoupling performance and correctness. In Proceedings of the International Symposium on Computer Architecture (ISCA). ACM Press, New York, 182--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Martin, M. M. K., Sorin, D. J., Hill, M. D., and Wood, D. A. 2002. Bandwidth adaptive snooping. In Proceedings of the Intrnational Symposium on High-Performance Computer Architecture (HPCA), 251--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Moshovos, A. 2005. Regionscout: Exploiting coarse grain sharing in snoop-based coherence. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, Washington, DC, 234--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Moshovos, A., Memik, G., Choudhary, A., and Falsafi, B. 2001. Jetty: Filtering snoops for reduced energy consumption in SMP servers. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA). IEEE Computer Society, Washington, DC, 85--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nilsson, J., Landin, A., and Stenstrom, P. 2003. The coherence predictor cache: A resource-efficient and accurate coherence prediction infrastructure. In Proceedings of the International Symposium on Parallel and Distributed Processing. IEEE Computer Society, Washington, DC, 10--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ramalingam, G. 1994. The undecidability of aliasing. ACM Trans. Program. Lang. Syst. 16, 5, 1467--1471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Rowen, C. 2004. Engineering the Complex SOC. Fast, Flexible Design with Configurable Processors. Prentice Hall, NJ.Google ScholarGoogle Scholar
  27. Rugina, R. and Rinard, M. 1999. Pointer analysis for multithreaded programs. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (PLDI) 34, 5, 77--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Salcianu, A. and Rinard, M. 2001. Pointer and escape analysis for multithreaded programs. In Proceedings of the Symposium on Principles and Practices of Parallel Programming (PPoPP), 12--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Saldanha, C. and Lipasti, M. 2001. Power efficient cache-coherence. In Workshop on Memory Performance Issues.Google ScholarGoogle Scholar
  30. Sangiovanni-Vincentelli, A. and Martin, G. 2001. Platform-Based design and software design methodology for embeddedsystems. IEEE Des. Test Comput. 18, 23--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Singh, J. P., Weber, W.-D., and Gupta, A. 1992. Splash: Stanford parallel applications for shared-memory. SIGARCH Comput. Archit. News 20, 1, 5--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Tarjan, D., Thoziyoor, S., and Jouppi, N. 2006. Cacti 4.0: An integrated cache timing, power and area model. Tech. Rep., HP Laboratories, Palo Alto, CA. June.Google ScholarGoogle Scholar
  33. Wenisch, T. F., Somogyi, S., Hardavellas, N., Kim, J., Ailamaki, A., and Falsafi, B. 2005. Temporal streaming of shared memory. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, Washington, DC, 222--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Wolf, W. 2001. Computers as Components: Principles of Embedded Computing Systems Design. Morgan Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wolf, W. 2004. The future of multiprocessor systems-on-chips. In Proceedings of the Design Automation Conference (DAC), 681--685. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Design Automation of Electronic Systems
          ACM Transactions on Design Automation of Electronic Systems  Volume 13, Issue 1
          January 2008
          496 pages
          ISSN:1084-4309
          EISSN:1557-7309
          DOI:10.1145/1297666
          Issue’s Table of Contents

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 February 2008
          • Accepted: 1 July 2007
          • Revised: 1 May 2007
          • Received: 1 May 2006
          Published in todaes Volume 13, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader