Abstract
Switching activity and instruction cycles are two of the most important factors in power dissipation when the supply voltage is fixed. This paper studies the scheduling and assignment problems that minimize the total energy caused by both instruction processing and switching activities for applications with loops on multi-core, multi-Functional-Unit (multi-FU) architectures. An algorithm, EMPLS (Energy Minimization with Probability using Loop Scheduling), is proposed to minimize the total energy (E) while satisfying timing constraint (L) with guaranteed probability (P). We perform scheduling and assignment simultaneously. Our approach shows better performance than the approaches that consider scheduling and assignment in separate phases. Compared with previous work, our algorithm exhibits significant improvement in total energy reduction.
Similar content being viewed by others
References
Chandrakasan, A., Sheng, S., & Brodersen, R. (1992). Low-power CMOS digital design. IEEE Journal of Solid-State Circuits, 27(4), 473–484, April.
Stan, M. R., & Burleson, W. P. (1995). Bus-invert coding for low-power I/O. IEEE Transactions on VLSI Systems, 3(1), 49–58, March.
Alidina, M., Monteiro, J., Devadas, S., Ghosh, A., & Papaefthymiou, M. (1994). Precomputation-based sequential logic optimization for low power. IEEE Transactions on VLSI Systems, 2(4), 426–436, December.
Raghunathan, A., & Jha, N. K. (1995). An ILP formulation for low power based on minimizing switched capacitance during data path allocation. In Proc. of the IEEE int. symp. on circuits & systems (pp. 1069–1073), May.
Benini, L., & De Micheli, G. (1995). State assignment for low power dissipation. IEEE Journal of Solid-State Circuit, 30(3), 258–268, March.
Macii, E., Pedram, M., & Somenzi, F. (1998). High-level power modeling, estimation and optimization. IEEE Transactions on Computer-Aided Design, 17, 1061–1079, November.
Panda, P. R., & Dutt, N. D. (1999). Low power memory mapping through reducing address bus activity. IEEE Transactions on VLSI Systems, 7(3), 309–320, September.
Parhi, K. K. (2001). Low-power implementation of DSP systems. IEEE Transactions on Circuits and Systems, Part-I: Fundamental Theory and Applications, 48(10), 1214–1224, October.
Kruse, L., Schmidt, E., Jochens, G., Stammermann, A., Schulz, A., Macii, E., et al. (2001). Estimation of lower and upper bounds on the power consumption from schedule data flow graphs. IEEE Transactions on VLSI Systems, 9(1), 3–14, February.
Kim, D., Shin, D., & Choi, K. (2001). Low power pipelining of linear systems: A common operand centric approach. In Proc. of the IEEE/ACM int. symp. on low power design (pp. 225–230), August.
Henning, R., & Chakrabarti, C. (2002). An approach to switching activity consideration during high level low power design space exploration. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 49(5), 339–351, May.
Shao, Z., Zhuge, Q., Liu, M., Xue, C., Sha, E. H.-M., & Xiao, B. (2006). Algorithm and analysis of scheduling for loops with minimum switching. International Journal of Computational Science and Engineering (IJCSE), 2, 88–97.
Venkatachalam, V., & Franz, M. (2005). Power reduction techniques for microprocessor systems. ACM Computing Surveys (CSUR), 37(3), 195–237, September.
Tongsima, S., Sha, E. H.-M., Chantrapornchai, C., Surma, D., & Passos, N. (2000). Probabilistic loop scheduling for applications with uncertain execution time. IEEE Transactions on Computers, 49, 65–80, January.
Ito, K., Lucke, L., & Parhi, K. (1998). Ilp-based cost-optimal dsp synthesis with module selection and data format conversion. IEEE Transactions on VLSI Systems, 6, 582–594, December.
Shao, Z., Zhuge, Q., Xue, C., & Sha, E. H.-M. (2005). Efficient assignment and scheduling for heterogeneous DSP systems. IEEE Transaction on Parallel and Distributed Systems (TPDS), 16, 516–525.
Banino, C., Beaumont, O., Carter, L., Ferrante, J., Legrand, A., & Robert, Y. (2004). Scheduling strategies for master-slave tasking on heterogeneous processor platforms. IEEE Transactions on Parallel Distributed Systems, 15(4), 319–330.
Dogan, A., & Özgüner, F. (2002) Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems, 13, 308–323, March.
Su, C.-L., Tsui, C.-Y., & Despain, A. M. (1994). Saving power in the control path of embedded processors. IEEE Design & Test of Computers, 11(4), 24–30, Winter.
Chang, J. M., & Pedram, M. (1995). Register allocation and binding for low power. In Proc. of the 32nd ACM/IEEE design automation conference (pp. 29–35), June.
Musoll, E., & Cortadella, J. (1995). Scheduling and resource binding for low power. In Proc. of the IEEE int. symp. on system synthesis (pp. 104–109), April.
Musoll, E., & Cortadella, J. (1995). High-level synthesis techniques for reducing the activity of low power. In Proc. of the IEEE/ACM int. symp. on low power design (pp. 99–104), April.
Mehendale, M., Sherlekar, S. D., & Venkatesh, G. (1995). Coefficient optimization for low power realization of fir filters. In IEEE workshop on VLSI signal processing (pp. 352–361).
Masselos, K., Theoharis, S., Merakos, P. K., Stouraitis, T., & Goutis, C. E. (2000). Low power synthesis of sum-of-products computation. In Proc. of the IEEE/ACM int. symp. on low power electronics and design (pp. 234–237), July.
Choi, K. W., & Chatterjee, A. (2001). Efficient instruction-level optimization methodology for low-power embedded systems. In Proc. of the IEEE int. symp. on system synthesis (pp. 147–152), October.
Tiwari, V., Malik, S., & Wolfe, A. (1994). Compilation techniques for low energy: An overview. In The symposium on low power electronics (pp. 38–39).
Tiwari, V., Malik, S., & Wolfe, A. (1994). Power analysis of embedded software: A first step towards software power minimization. IEEE Transactions on VLSI Systems, 2(4), 437–445, December.
Lee, M. T.-C., Fujita, M., Tiwari, V., & Malik, S. (1997). Power analysis and minimization techniques for embedded dsp software. IEEE Transactions on VLSI Systems, 5(1), 123–135, March.
Parikh, A., Kandemir, M., Vijaykrishnan, N., & Irwin, M. J. (2000). Instruction scheduling based on energy and performance constraints. In IEEE computer society annual workshop on VLSI (pp. 37–42), April.
Toburen, M. C., Conte, T. M., & Reilly, M. (1998). Instruction scheduling for low power dissipation in high performance processors. In The power driven micro-architecture workshop in conjunction with the ISCA’98, June.
Lee, C., Lee, J.-K., & Hwang, T. (2003). Compiler optimization on VLIW instruction scheduling for low power. ACM Transactions on Design Automation of Electronic Systems, 8(2), 252–268, April.
Lam, M. (1988). Software pipelining: An effective scheduling technique for VLIW machines. In ACM PLDI (pp. 318–328), June.
Ramakrishna Rau, B., Schlansker, M. S., & Tirumalai, P. P. (1992). Code generation schema for modulo scheduled loops. In The 25th annual international symposium on microarchitecture (pp. 158–169), December.
Huff, R. A. (1993). Lifetime-sensitive modulo scheduling. In ACM PLDI (pp. 258–267), June.
Chao, L.-F., LaPaugh, A. S., & Sha, E. H.-M. (1997). Rotation scheduling: A loop pipelining algorithm. IEEE Transactions on Computer-Aided Design, 16(3), 229–239, March.
Yun, H.-S., & Kim, J. (2001). Power-aware modulo scheduling for high-performance VLIW processors. In IEEE/ACM ISPLED (pp. 40–45), August.
Yang, H., Gao, G., & Leung, C. (2002) On achieving balanced power consumption in software pipelined loops. In IEEE/ACM ISPLED (pp. 210–217).
Yu, T. Z., Chen, F., & Sha, E. H.-M. (1998). Loop scheduling algorithms for power reduction. In Proc. of the IEEE int. conf. on acoustics, speech, and signal processing (Vol. 5, pp. 3073–3076), May.
Leiserson, C. E., & Saxe, J. B. (1991). Retiming synchronous circuitry. Algorithmica, 6, 5–35.
Bona, A., Sami, M., Sciuto, D., Silvano, C., Zaccaria, V., & Zafalon, R. (2002). Energy estimation and optimization of embedded VLIW processors based on instruction clustering. In IEEE/ACM DAC (pp. 886–891).
Texas Instruments, Inc. (2001). TMS320C6000 peripherals reference guide (Rev. D). Dallas: Texas Instruments.
Texas Instruments, Inc. (2000) TMS320C6000 CPU and instruction set reference guide. Dallas: Texas Instruments.
Micheli, G. D. (1994) Synthesis and optimization of digital circuits. New York: McGraw-Hill.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qiu, M., Liu, M., Li, H. et al. Energy-Aware Loop Scheduling and Assignment for Multi-Core, Multi-Functional-Unit Architecture. J Sign Process Syst Sign Image Video Technol 57, 363–379 (2009). https://doi.org/10.1007/s11265-008-0312-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-008-0312-5