Abstract
In current systems, while it is necessary to exploit the availability of multiple cores, it is also mandatory to consume less energy. To speed up the development process and make it as transparent as possible to the programmer, parallelism is exploited through the use of Application Programming Interfaces (API). However, each one of these API implements different ways to exchange data using shared memory regions, and by consequence, they have different levels of energy consumption. In this paper, considering general purpose and embedded systems, we show how each API influences the performance, energy consumption and Energy-Delay Product. For example, Pthreads consumes 12 % less energy on average than OpenMP and MPI considering all benchmarks. We also demonstrate that the difference in Energy-Delay Product (EDP) among the APIs can be of up to 81 %, while the level of efficiency (e.g.: performance or energy consumption per core) changes as the number of threads increases, depending on whether the system is embedded or general purpose.
Similar content being viewed by others
References
Cheney, W., & Kincaid, D. (2009). Linear Algebra: Theory and Applications. Sudbury (Pp. 544–558).
Korthikanti, V.A.,& Agha, G. (2010). “Towards optimizing energy costs of algorithms for shared memory architectures”. Proceedings of the 22nd ACM SPAA (pp. 157–165).
Ji, J., Wang, C., Zhou, X. (2008). “System-Level early power estimation for memory subsystem in embedded systems.” Fifth IEEE International Symposium on Embedded Computing (pp. 370–375).
Suleman, M.A., Qureshi, M.K., & Patt, Y.N. (2008). “Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs”. ASPLOS XIII (pp. 277–286).
Chen, J., Dong, Y., Yang, X., Wang, P. (2008). “Energy-Constrained OpenMP static loop scheduling”. In High Perform. Comput. and Communications (pp. 139–146).
Balladini,J., Suppi, R., Rexachs, D., Luque, E. (2011). “Impact of parallel programming models and CPUs clock frequency on energy consumption of HPC systems.” AICCSA ’11, IEEE (pp. 16–21).
Berlin,K., Huan, J., Jacob, M., Rochhar, G., Prins, J., Pugh, B., Sadayappan, P., Spacco, J., Tseng, C. (2003). “Evaluating of programming language features on the performance of parallel applications on cluster architectures. “In proc. LCPC 2003 (pp. 194–208).
Adve, V.S., Vernon, M.K. (1998). “A deterministic model for parallel program evaluate performance evaluation”. Techreport in Rice University and University of Wisconsin-Madison.
Lee, K.M., Song, T.H., Yoon, S-H., Kwon, K-H., Jeon, J-W. (2011). “OpenMP parallel programming using dual-core embedded system,” In 11th ICCAS.
Hanawa, T., Sato, M., Lee, J., Imada, T., Kimura, H., & Boku, T. (2009). Evaluation of multicore processors for embedded systems by parallel benchmark program using openmp”. Lecture Notes in Computer Science, 5568, 15–27. Springer.
Chapman, B., Jost, G., Van Der Pas, R. (2008). “Using OpenMP: portable shared memory parallel programming”, The MIT Press.
Rauber,T., Runger, G. (2010). “Parallel Programming - for Multicore and Cluster Systems”. [S.l.]: Springer.
Butenhof, D. R. (1997). Programming with POSIX threads. Boston: Addison-Wesley Longman Publishing Co., Inc.
Tanenbaum, A.S., & Woodhul, A.S. (2009). “Operating Systems: design and implementation”, Prentice-Hall.
Gropp,W. et. Al. (1998). “MPI- The complete reference”. Cambridge. MA, MIT Press.
Gao,C., Gutierrez, A., Dreslinski, R.G., Mudge, T., Flautner, K., Blake, G. (2014). “A study of thread level parallelism om mobile devices”. In IEEE ISPASS. (pp. 126–127).
Gardner, M. (1970). “Mathematical games – the fantastic combinations of john conway’s new solitaire game, life”, Scientific American, (pp 120–123).
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P. (2007). “Numerical recipes 3rd edition: The art of scientific computing”. Cambridge University Press.
Oliveira, A.B.. & Scharcanski, J. (2010). “Vehicle Couting and Trajectory Detection Based on Particle Filtering”. In XXIII SIBGRAPI.
Aherne, F., Thacker, N., & Rockett, P. (1998). The bhattacharya metric as an absolute similarity measure for frequency coded data. Kybernetica, 34(4), 363–368.
Blake, G., Dreslinski, R.G., Mudge, T., Flautner, K. (2010). “Evolution of thread-level parallelism in desktop applications”. In Proceedings of the 37th annual international symposium on computer architecture.
Dixon, S. L., Steele, K. L., & Burton, R. P. (1996). Generation and graphical analysis of Mandelbrot and Julia Sets in more than four dimensions”. Computers and Graphics, 20, 451–456.
Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1, 269–271.
Browne, S., Dongarra, J., Graner, N., Ho, G., & Mucci, P. (2000). A portable programming interface for performance evaluation on modern processors. International Journal High Performance Computer Applications, 14, 189–204.
CACTI. Retrieved September 2013 from: http://www.cs.utah.edu/~rajeev/cacti6/.
Blem, E., Menon, J., Sankaralingam, K. (2013). “A detailed Analysis of the Contemporary ARM and x86 Architectures”, UW-Madison Technical Report.
Andrews, G.E., Askey, R., Roy, R. (1999). “Special Functions”, Cambridge University Press.
Intel Atom Processor D2000 and N2000 Series http://www.intel.com/content/dam/doc/datasheet/atom-d2000-n2000-vol-1-datasheet.pdf.
Intel Core 2 Extreme Processor Q84000 Series http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/core2-qx9000-q9000-q8000-datasheet.pdf.
Foster, I.T. (1995). “Designing and Building Parallel Programs – Concepts and Tools for Parallel Software Engineering” Addison-Wesley Press.
Tristam, W., Bradshaw, K. (2010). “Investigating the Performance and Code Characteristics of Three Parallel Programming Models for C++”. In SATNAC.
Kuhn, B., Petersen, P., & O’toole, E. (2000). Open-MP versus Threading in C/C++. Concurrency: Practice Experimental. doi:10.1002/1096-9128(200010)12:12.
Ajkunic, E., Fatkic, H., Omerovic, E., Talic, K., Nosovic. N. (2012). “A comparison of Five Parallel Programming Models for C++”. In Proc. Of the 35th International Convention MIPRO. (pp. 1780–1784).
Patel, I., Gilbert, J.R. (2008). “An Empirical Study of the Performance and Productivity of Two Parallel Programming Models”. In Proc. of the IPDPS.
Wilson, G.V., Bal, H.E. (1996). “Using the Cowichan Problems to Assess the Usability of Orca”. In IEEE PDTSA.
Gropp, W., Lusk, E., & Thakur, R. (1999). Using MPI-2: Advanced Features of the Message Passing Interface. Cambridge: MIT Press.37. Beck, A.C.S., Lisboa, C.A. and Carro, L. (2012). Adaptable Embedded Systems. Springer-Verlag.
Beck, A.C.S., Lisboa, C.A. and Carro, L. (2012). Adaptable Embedded Systems. Springer-Verlag.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lorenzon, A.F., Cera, M.C. & Schneider Beck, A.C. Performance and Energy Evaluation of Different Multi-Threading Interfaces in Embedded and General Purpose Systems. J Sign Process Syst 80, 295–307 (2015). https://doi.org/10.1007/s11265-014-0925-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-014-0925-9