Abstract
Minimization of power dissipation can be considered at algorithmic, compiler, architectural, logic, and circuit level. Recent research trends for multicore programming models have come to the direction that parallel design patterns can be a solution to develop multicore applications. As parallel design patterns are with regularity, we view this as a great opportunity to exploit power optimizations in the software layer. In this paper, we investigate compilers for low power with parallel design patterns on embedded multicore systems. We evaluate four major parallel design patterns, Pipe and Filter, MapReduce with Iterator, Puppeteer, and Bulk Synchronous Parallel (BSP) Model. Our work attempts to devise power optimization schemes in compilers by exploiting the opportunities of the recurring patterns of embedded multicore programs. The proposed optimization schemes are rate-based optimization for Pipe and Filter pattern , early-exit power optimization for MapReduce with Iterator pattern, power aware mapping algorithm for Puppeteer pattern, and multi-phases power gating scheme for BSP pattern. In our experiments, real world multicore applications are evaluated on a multicore power simulator. Significant power reductions are observed from the experimental results. Therefore, we present a direction for power optimizations that one can further identify additional key design patterns for embedded multicore systems to explore power optimization opportunities via compilers.

















Similar content being viewed by others
References
Macii, E., Pedram, M., Somenzi, F. (1998). High-level power modeling, estimation, and optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17(11), 1061–1079.
You, Y.p., Lee, Chingren, Lee, J.K. (2002). Compiler analysis and supports for leakage power reduction on microprocessors. In Proceedings of the 15th International Workshop on Languages and Compilers for Parallel Computing(LCPC’02).
Chang, J.-M., & Pedram, M. (1995). Register allocation and binding for low power. In DAC ’95: Proceedings of the 32nd annual ACM/IEEE Design Automation Conference, (pp. 29–35).
Lee, C., Lee, J.K., Hwang, T., Tsai, S.-C. (2003). Compiler optimization on vliw instruction scheduling for low power. ACM Transactions on Design Automation of Electronic Systems, 8(2), 252–268.
You, Y.-P., Huang, C.-W., Lee, J.K. (2007). Compilation for compact power-gating controls. ACM Transactions on Design Automation of Electronic Systems (TODAES), 12(4), 51.
Lin, Y.-C., You, Y.-P., Huang, C.-W., Lee, J.-K., Shih, W.-K., Hwang, T.-T. (2004). Power-aware scheduling for parallel security processors with analytical models. In Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing(LCPC’04).
Mattson, T.G., Sanders, B.A., Massingill, B.L. (2004). In Patterns for Parallel Programming. Addison-Wesley.
Keutzer, K., & Mattson, Tim. (2009). Our pattern language (pol): A design pattern language for engineering (parallel) software. In ParaPLoP Workshop on Parallel Programming Patterns.
Gamma, E., Helm, R., Johnson, R., Vlissides, J.M. (1994). Design Patterns: Elements of reusable object oriented software. Addison-Wesley.
SID simulator component develop’s guide. Red Hat Inc., http://sources.redhat.com/sid/.
Hoffmann, H., Agarwal, A., Devadas, S. (2009). Partitioning strategies: spatiotemporal patterns of program decomposition. In Proceedings of the 21st IASTED International Conference on Parallel and Distributed Computing and Systems, PDCS 2009.
Keutzer, K., & Mattson, T. (2011). Architecture parallel software: design patterns in practice and teaching. In Presented as the 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011.
Massingill, B.L., Mattson, T.G., Sanders, B.A. (2007). Simd: an additional pattern for plpp (pattern language for parallel programming). In Proceedings of the 14th Conference on Pattern Languages of Programs, PLOP ’07, (pp. 6:1–6:15).
Dean, J., & Ghemawat, S. (2004). Mapreduce: simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation, OSDI’04.
He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T. (2008). Mars: a mapreduce framework on graphics processors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT ’08.
Valiant, L.G. (1990). A bridging model for parallel computation. Communications of the ACM, 33 (8), 103–111.
Diamos, G.F., Kerr, A.R., Yalamanchili, S., Ocelot, N.C. A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT ’10, (pp. 353–364).
Stone, J.E., Gohara, D., Opencl, G.S. (2010). A parallel programming standard for heterogeneous computing systems. IEEE Design Test, 12(3), 66–73.
Shih, W.-L. Compiler Optimization for Reducing Leakage Power in Multithread BSP Programs. PhD thesis, 2014.
Lin, C.-Y., Chen, P.-Y., Tseng, C.-K., Huang, C.-W., Weng, C.-C., Kuan, C.-B., Lin, S.-H., Huang, S.-Y., Lee, J.-K. (2010). Power aware sid-based simulator for embedded multicore dsp subsystems. In Proceedings of the 8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES/ISSS ’10.
Andes Technology (2010). Andescore n1213-s product brief, http://www.andestech.com/en/products/.
Hsu, C.-W., Liao, J.-L., Fang, S.-C., Weng, C.-C., Huang, S.-Y., Hsieh, W.-T., Yeh, J.-C. (2011). Powerdepot: integrating ip-based power modeling with esl power analysis for multi-core soc designs. In Proceedings of the 48th Design Automation Conference, DAC’11.
Li, M.-C., Weng, C.-C., Tai, T.-Y., Shi-Hunag (2008). Extrapolation-based power modeling for memory compilers using MUX-oriented linear regression. In VLSI/CAD Symposium.
Janzen, J. (2001). Calculating memory system power for ddr sdram. Designline, 10(2).
Open64. http://www.open64.net/.
Chia-Han, L., Lin, Y.-C., You, Y.-P., Lee, J.-K. (2009). Lc-grfa: global register file assignment with local consciousness for vliw dsp processors with non-uniform register files. Concurrent Computing: Practice Experimenting, 21(1), 101–114.
Lin, Y.-C., You, Y.-P., Lee, J.-K. (2007). Palf: compiler supports for irregular register files in clustered vliw dsp processors: research articles. Concurrent Computing: Practice Experimenting, 19(18), 2391–2406.
Chen, C.-K., Tseng, L.-H., Chen, S.-C., Lin, Y.-J., You, Y.-P., Chia-Han, L., Lee, J.-K. (2007). Enabling compiler flow for embedded vliw dsp processors with distributed register files. SIGPLAN Notices, 42(7), 146–148.
Chen, Y.-C., Te-Feng, S., Lai, S.-H. (2013). Efficient vehicle dtection with adaptive scan based on perspective geometry. In EEE International Conference on Image Processing (ICIP).
Bellas, N., Hajj, I., Polychronopoulos, C.D., Stamoulis, G. (2000). Architectural and compiler techniques for energy reduction in high-performance microprocessors. IEEE Transaction on Very Large Scale Integration (VLSI) Systems, 8 (3), 317–326.
Semeraro, G., Albonesi, D.H., Dropsho, S.G., Magklis, G., Dwarkadas, S., Scott, M.L. (2002). Dynamic frequency and voltage control for a multiple clock domain microarchitecture. In MICRO 35: Proceedings of the 35th annual ACM/IEEE International Symposium on Microarchitecture, (pp. 356–367).
Yang, H., Gao, G.R., Leung, C. (2002). On achieving balanced power consumption in software pipelined loops. In CASES ’02: Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, (pp. 210–217).
Rele, S., Pande, S., Önder, S., Gupta, R. (2002). Optimizing static power dissipation by functional units in superscalar processors. In Proceedings of the 11th International Conference on Compiler Construction, (pp. 261–275).
Kimura, K., Mase, M., Mikami, H., Miyamoto, T., Shirako, J., Kasahara, H. (2010). Oscar api for real-time low-power multicores and its performance on multicores and smp servers. In: Proceedings of the 22nd International conference on Languages and Compilers for Parallel Computing, LCPC’09, (pp. 188–202).
Ozturk, O., Kandemir, M., Chen, G. (2013). Compiler-directed energy reduction using dynamic voltage scaling and voltage islands for embedded systems. IEEE Transactions on Computers, 62 (2), 268–278.
Agosta, G., Bessi, M., Capra, E., Francalanci, C (2012). Automatic memoization for energy efficiency in financial applications. Sustainable Computing: Informatics and Systems, 2(2), 105–115.
Bartenstein, T.W., & Liu, Y.D. Green streams for data-intensive software. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, (pp. 532–541).
Lin, C.-Y., Kuan, C.-B., Lee, J.K. (2013). Compilers for low power with design patterns on embedded multicore systems. In 2013 42nd International Conference on Parallel Processing (ICPP), (pp. 1052–1060).
Acknowledgments
This work is supported in part by Ministry of Economic Affairs under grant no. 101-EC-17-A-02-S1-202 and National Science Council under grant no.102-2219-E-007-001 and 102-2220-E-007-001 in Taiwan. The authors also greatly appreciate Prof. Shang-Hong Lai and his student Yu-Chun Chen and Te-Feng Su of National Tsing Hua University, Taiwan for providing the vehicle detection application as a test case in our experiments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, CY., Kuan, CB., Shih, WL. et al. Compilers for Low Power with Design Patterns on Embedded Multicore Systems. J Sign Process Syst 80, 277–293 (2015). https://doi.org/10.1007/s11265-014-0917-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-014-0917-9