skip to main content
research-article

Workload assignment considering NBTI degradation in multicore systems

Published:13 January 2014Publication History
Skip Abstract Section

Abstract

With continuously shrinking technology, reliability issues such as Negative Bias Temperature Instability (NBTI) has resulted in considerable degradation of device performance, and eventually the short mean-time-to-failure (MTTF) of the whole multicore system. This article proposes a new workload balancing scheme based on device-level fractional NBTI model to balance the workload among active cores while relaxing stressed ones. Starting with NBTI-induced threshold voltage degradation, we define a concept of Capacity Rate (CR) as an indication of one core's ability to accept workload. Capacity rate captures core's performance variability in terms of delay and power metrics under the impact of NBTI aging. The proposed workload balancing framework employs the capacity rates as workload constraints, applies a Dynamic Zoning (DZ) algorithm to group cores into zones to process task flows, and then uses Dynamic Task Scheduling (DTS) to allocate tasks in each zone with balanced workload and minimum communication cost. Experimental results on a 64-core system show that by allowing a small part of the cores to relax over a short time period, the proposed methodology improves multicore system yield (percentage of core failures) by 20%, while extending MTTF by 30% with insignificant degradation in performance (less than 3%).

References

  1. Abella, J., Vera, X., and Gonzalez, A. 2007. Penelope: The NBTI-Aware Processor. In Proceedings of International Symposium on Microarchitecture. 85--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alam, M. and Mahapatra, S. 2008. A comprehensive model of PMOS NBTI degradation. Microelectron. Reliab. 45, 1, 71--81.Google ScholarGoogle ScholarCross RefCross Ref
  3. Basoglu, M., Orshansky, M., and Erez, M. 2010. NBTI-aware DVFS: A new approach to saving energy and increasing processor lifetime. In Proceedings of ISPLED. 253--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bhardwaj, S., Wang, W., Vttikonda, R., Cao, Y., and Vrudhula, S. 2006. Predictive modeling of the NBTI effect for reliable design. In Proceedings of CICC. 189--192.Google ScholarGoogle Scholar
  5. Bild, D., Bok, G., and Dick, R. 2009. Minimization of NBTI performance degradation using internal node control. In Proceedings of DATE. 148--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chen, G., Chuah, K. Y., Li, M. F., Chan, D. S., Ang, C. H., Zheng, J. Z., Jin, Y., and Kwong, D. L. 2003. Dynamic NBTI of pmos transistors and its impact on device lifetime. In Proceedings of IRPS. 196--202.Google ScholarGoogle Scholar
  7. Constantinides, K., Plaza, S., Blome, J., Bertacco, V., Mahlke, S., Austin, T., Zhang, B., and Orshansky, M. 2007. Architecting a reliable CMP switch architecture. ACM Trans. Architect. Code Optimizat. 4, 1, 1--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Coskun, A. K., Rosing, T. S., and Whisnan, K. 2007. Temperature Aware Task Scheduling in MPSoCs. In Proceedings of DATE. 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. El-Rewini, H., Lewis, T. G., and Ali, H. H. 1994. Task Scheduling in Parallel and Distributed Systems. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fischetti, M. and Lodi, A. 2003. Local branching. Math. Prog. 98, 1--3, 23--47.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Greskamp, B., Sarangi, S. R., and Torrellas, J. 2007. Threshold voltage variation effects on aging-related hard failure rates. In Proceedings of ISCAS. 1261--1264.Google ScholarGoogle Scholar
  12. Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of IEEE International Workshop on Workload Characterization. 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hung, W.-L., Xie, Y., Vijaykrishnan, N., Kandemir, M., and Irwin, M. J. 2005. Thermal-aware task allocation and scheduling for embedded systems. In Proceedings of DATE. 898--899. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lee, C., Potkonjak, M., and Mangione-Smith, W.-H. 2008. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of MICRO. 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lee, E.-A. and Messerschmitt, D. G. 1987. Synchronous data flow. Proc. IEEE 75, 9, 1235--1245.Google ScholarGoogle ScholarCross RefCross Ref
  16. Lin, C.-H., Lin, I.-C., and Li, K.-H. 2011. TG-based technique for NBTI degradation and leakage optimization. In Proceedings of ISPLED. 133--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Memik, G., Mangione-Smith, W. H., and Hu, W. 2001. NetBench: A benchmarking suite for network processors. In Proceedings of ICCAD. 39--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Papoulis, A. 2002. Probability, Random Variables and Stochastic Processes. McGraw-Hill, New York.Google ScholarGoogle Scholar
  19. Paul, B. C., Kang, K., Kufluoglu, H., Alam, M. A., and Roy, K. 2005. Impact of NBTI on the temporal performance degradation of digital circuits. IEEE Electron Dev. Lett. 26, 8, 560--562.Google ScholarGoogle ScholarCross RefCross Ref
  20. Reddy, V., Krishnan, A. T., Marshall, A., Rodriguez, J., Natarajan, S., Rost, T., and Krishnan, S. 2002. Impact of negative bias temperature instability on digital circuit reliability. In Proceedings of IRPS. 248--254.Google ScholarGoogle Scholar
  21. Rong, P. and Pedram, M. 2006. Power-aware scheduling and dynamic voltage setting for tasks running on a hard real-time system. In Proceedings of ASPDAC. 473--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ruggiero, M., Guerri, A., Bertozzi, D., Poletti, F., and Milano, M. 2006. Communication aware allocation and scheduling framework for stream-oriented multi-processor system-on-chip. In Proceedings of DATE. 3--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sarangi, S., Greskamp, B., Tiwari, A., and Torrellas, J. 2008a. Eval: Utilizing processors with variation-induced timing errors. In Proceedings of MICRO. 423--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sarangi, S. R., Greskamp, B., Teodorescu, R., Nakano, J., Tiwari, A., and Torrellas, J. 2008b. VARIUS: A model of process variation and resulting timing errors for microarchitects. IEEE Trans. Semi. Manu. 21, 1, 3--13.Google ScholarGoogle ScholarCross RefCross Ref
  25. Schrijver, A. 1998. Theory of Linear and Integer Programming. Wiley.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Schrijver, A. 2003. Combinatorial Optimization: Polyhedra and Efficiency. Springer.Google ScholarGoogle Scholar
  27. Skadron, K., Stan, M. R., Sankaranarayanan, K., Huang, W., Velusamy, S., and Tarjan, D. 2004. Temperature-aware microarchitecture: modeling and implementation. ACM Trans. Architect. Code Optim. 1, 1, 94--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Srinivasan, J., Adve, S. V., Bose, P., and Rivers, J. A. 2004. The impact of technology scaling on lifetime reliability. In Proceedings of Dependable Systems and Networks. 177--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Srinivasan, J., Adve, S. V., Bose, P., and Rivers, J. A. 2005. Exploiting structural duplication for lifetime reliability enhancement. In Proceedings of ISCA. 520--531. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sun, J., Ma, D., Li, J., and Wang, J. M. 2008. Chebyshev-affine-arithmetic based parametric yield prediction under limited descriptions of uncertainty. IEEE Trans. Comput. Aid. Design Integ. Circ. Syst. 27, 10, 1852--1866. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Waldshmidt, K., Haase, J., Hofmann, A., Damm, M., and Hauser, D. 2006. Reliability-aware power management of multi-core systems (MPSOCS). In Proceedings of Dynamically Reconfigurable Architectures. 520--531.Google ScholarGoogle Scholar
  32. Wang, W., Yang, S., Bhardwaj, S., Wattikonda, R., Vrudhula, S., Liu, F., and Cao, Y. 2007a. The impact of NBTI on the performance of combinational and sequential circuits. In Proceedings of DAC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wang, Y., Luo, H., He, K., Luo, R., Yang, H., and Xie, Y. 2007b. Temperature-aware NBTI modeling and the impact of input vector control on performance degradation. In Proceedings of DATE. 546--551. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Wolsey, L. A. and Nemhauser, G. L. 1999. Integer and Combinatorial Optimization. Wiley-Interscience. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhang, S., Wason, V., and Banerjee, K. 2004. A probabilistic framework to estimate full-chip threshold leakage power distribution considering within-die and die-to-die P-T-V variations. In Proceedings of ISLPED. 156--161. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Workload assignment considering NBTI degradation in multicore systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Journal on Emerging Technologies in Computing Systems
      ACM Journal on Emerging Technologies in Computing Systems  Volume 10, Issue 1
      Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
      January 2014
      210 pages
      ISSN:1550-4832
      EISSN:1550-4840
      DOI:10.1145/2543749
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 January 2014
      • Accepted: 1 November 2012
      • Revised: 1 July 2012
      • Received: 1 March 2012
      Published in jetc Volume 10, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader