Abstract
With continuously shrinking technology, reliability issues such as Negative Bias Temperature Instability (NBTI) has resulted in considerable degradation of device performance, and eventually the short mean-time-to-failure (MTTF) of the whole multicore system. This article proposes a new workload balancing scheme based on device-level fractional NBTI model to balance the workload among active cores while relaxing stressed ones. Starting with NBTI-induced threshold voltage degradation, we define a concept of Capacity Rate (CR) as an indication of one core's ability to accept workload. Capacity rate captures core's performance variability in terms of delay and power metrics under the impact of NBTI aging. The proposed workload balancing framework employs the capacity rates as workload constraints, applies a Dynamic Zoning (DZ) algorithm to group cores into zones to process task flows, and then uses Dynamic Task Scheduling (DTS) to allocate tasks in each zone with balanced workload and minimum communication cost. Experimental results on a 64-core system show that by allowing a small part of the cores to relax over a short time period, the proposed methodology improves multicore system yield (percentage of core failures) by 20%, while extending MTTF by 30% with insignificant degradation in performance (less than 3%).
- Abella, J., Vera, X., and Gonzalez, A. 2007. Penelope: The NBTI-Aware Processor. In Proceedings of International Symposium on Microarchitecture. 85--96. Google ScholarDigital Library
- Alam, M. and Mahapatra, S. 2008. A comprehensive model of PMOS NBTI degradation. Microelectron. Reliab. 45, 1, 71--81.Google ScholarCross Ref
- Basoglu, M., Orshansky, M., and Erez, M. 2010. NBTI-aware DVFS: A new approach to saving energy and increasing processor lifetime. In Proceedings of ISPLED. 253--248. Google ScholarDigital Library
- Bhardwaj, S., Wang, W., Vttikonda, R., Cao, Y., and Vrudhula, S. 2006. Predictive modeling of the NBTI effect for reliable design. In Proceedings of CICC. 189--192.Google Scholar
- Bild, D., Bok, G., and Dick, R. 2009. Minimization of NBTI performance degradation using internal node control. In Proceedings of DATE. 148--153. Google ScholarDigital Library
- Chen, G., Chuah, K. Y., Li, M. F., Chan, D. S., Ang, C. H., Zheng, J. Z., Jin, Y., and Kwong, D. L. 2003. Dynamic NBTI of pmos transistors and its impact on device lifetime. In Proceedings of IRPS. 196--202.Google Scholar
- Constantinides, K., Plaza, S., Blome, J., Bertacco, V., Mahlke, S., Austin, T., Zhang, B., and Orshansky, M. 2007. Architecting a reliable CMP switch architecture. ACM Trans. Architect. Code Optimizat. 4, 1, 1--37. Google ScholarDigital Library
- Coskun, A. K., Rosing, T. S., and Whisnan, K. 2007. Temperature Aware Task Scheduling in MPSoCs. In Proceedings of DATE. 1--6. Google ScholarDigital Library
- El-Rewini, H., Lewis, T. G., and Ali, H. H. 1994. Task Scheduling in Parallel and Distributed Systems. Prentice Hall. Google ScholarDigital Library
- Fischetti, M. and Lodi, A. 2003. Local branching. Math. Prog. 98, 1--3, 23--47.Google ScholarDigital Library
- Greskamp, B., Sarangi, S. R., and Torrellas, J. 2007. Threshold voltage variation effects on aging-related hard failure rates. In Proceedings of ISCAS. 1261--1264.Google Scholar
- Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of IEEE International Workshop on Workload Characterization. 3--14. Google ScholarDigital Library
- Hung, W.-L., Xie, Y., Vijaykrishnan, N., Kandemir, M., and Irwin, M. J. 2005. Thermal-aware task allocation and scheduling for embedded systems. In Proceedings of DATE. 898--899. Google ScholarDigital Library
- Lee, C., Potkonjak, M., and Mangione-Smith, W.-H. 2008. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of MICRO. 330--335. Google ScholarDigital Library
- Lee, E.-A. and Messerschmitt, D. G. 1987. Synchronous data flow. Proc. IEEE 75, 9, 1235--1245.Google ScholarCross Ref
- Lin, C.-H., Lin, I.-C., and Li, K.-H. 2011. TG-based technique for NBTI degradation and leakage optimization. In Proceedings of ISPLED. 133--138. Google ScholarDigital Library
- Memik, G., Mangione-Smith, W. H., and Hu, W. 2001. NetBench: A benchmarking suite for network processors. In Proceedings of ICCAD. 39--42. Google ScholarDigital Library
- Papoulis, A. 2002. Probability, Random Variables and Stochastic Processes. McGraw-Hill, New York.Google Scholar
- Paul, B. C., Kang, K., Kufluoglu, H., Alam, M. A., and Roy, K. 2005. Impact of NBTI on the temporal performance degradation of digital circuits. IEEE Electron Dev. Lett. 26, 8, 560--562.Google ScholarCross Ref
- Reddy, V., Krishnan, A. T., Marshall, A., Rodriguez, J., Natarajan, S., Rost, T., and Krishnan, S. 2002. Impact of negative bias temperature instability on digital circuit reliability. In Proceedings of IRPS. 248--254.Google Scholar
- Rong, P. and Pedram, M. 2006. Power-aware scheduling and dynamic voltage setting for tasks running on a hard real-time system. In Proceedings of ASPDAC. 473--478. Google ScholarDigital Library
- Ruggiero, M., Guerri, A., Bertozzi, D., Poletti, F., and Milano, M. 2006. Communication aware allocation and scheduling framework for stream-oriented multi-processor system-on-chip. In Proceedings of DATE. 3--8. Google ScholarDigital Library
- Sarangi, S., Greskamp, B., Tiwari, A., and Torrellas, J. 2008a. Eval: Utilizing processors with variation-induced timing errors. In Proceedings of MICRO. 423--434. Google ScholarDigital Library
- Sarangi, S. R., Greskamp, B., Teodorescu, R., Nakano, J., Tiwari, A., and Torrellas, J. 2008b. VARIUS: A model of process variation and resulting timing errors for microarchitects. IEEE Trans. Semi. Manu. 21, 1, 3--13.Google ScholarCross Ref
- Schrijver, A. 1998. Theory of Linear and Integer Programming. Wiley.Google ScholarDigital Library
- Schrijver, A. 2003. Combinatorial Optimization: Polyhedra and Efficiency. Springer.Google Scholar
- Skadron, K., Stan, M. R., Sankaranarayanan, K., Huang, W., Velusamy, S., and Tarjan, D. 2004. Temperature-aware microarchitecture: modeling and implementation. ACM Trans. Architect. Code Optim. 1, 1, 94--125. Google ScholarDigital Library
- Srinivasan, J., Adve, S. V., Bose, P., and Rivers, J. A. 2004. The impact of technology scaling on lifetime reliability. In Proceedings of Dependable Systems and Networks. 177--186. Google ScholarDigital Library
- Srinivasan, J., Adve, S. V., Bose, P., and Rivers, J. A. 2005. Exploiting structural duplication for lifetime reliability enhancement. In Proceedings of ISCA. 520--531. Google ScholarDigital Library
- Sun, J., Ma, D., Li, J., and Wang, J. M. 2008. Chebyshev-affine-arithmetic based parametric yield prediction under limited descriptions of uncertainty. IEEE Trans. Comput. Aid. Design Integ. Circ. Syst. 27, 10, 1852--1866. Google ScholarDigital Library
- Waldshmidt, K., Haase, J., Hofmann, A., Damm, M., and Hauser, D. 2006. Reliability-aware power management of multi-core systems (MPSOCS). In Proceedings of Dynamically Reconfigurable Architectures. 520--531.Google Scholar
- Wang, W., Yang, S., Bhardwaj, S., Wattikonda, R., Vrudhula, S., Liu, F., and Cao, Y. 2007a. The impact of NBTI on the performance of combinational and sequential circuits. In Proceedings of DAC. Google ScholarDigital Library
- Wang, Y., Luo, H., He, K., Luo, R., Yang, H., and Xie, Y. 2007b. Temperature-aware NBTI modeling and the impact of input vector control on performance degradation. In Proceedings of DATE. 546--551. Google ScholarDigital Library
- Wolsey, L. A. and Nemhauser, G. L. 1999. Integer and Combinatorial Optimization. Wiley-Interscience. Google ScholarDigital Library
- Zhang, S., Wason, V., and Banerjee, K. 2004. A probabilistic framework to estimate full-chip threshold leakage power distribution considering within-die and die-to-die P-T-V variations. In Proceedings of ISLPED. 156--161. Google ScholarDigital Library
Index Terms
- Workload assignment considering NBTI degradation in multicore systems
Recommendations
Workload capacity considering NBTI degradation in multi-core systems
ASPDAC '10: Proceedings of the 2010 Asia and South Pacific Design Automation ConferenceAs device feature sizes continue to shrink, long-term reliability such as Negative Bias Temperature Instability (NBTI) leads to low yields and short mean-time-to-failure (MTTF) in multi-core systems. This paper proposes a new workload balancing scheme ...
Estimation of NBTI Degradation using IDDQ Measurement
Proceedings of the 2007 IEEE International Reliability Physics Symposium Proceedings. 45th AnnualNegative bias temperature instability (NBTI) has emerged as a major reliability degradation factor in nano-scale CMOS technology. In this paper, we analyze the impact of NBTI degradation in both the maximum operating frequency (fMAX) and the total ...
NBTI Degradation: A Problem or a Scare?
VLSID '08: Proceedings of the 21st International Conference on VLSI DesignNegative Bias Temperature Instability (NBTI) has been identified as a major and critical reliability issue for PMOS devices in nano-scale designs. It manifests as a negative threshold voltage shift, thereby degrading the performance of the PMOS devices ...
Comments