Skip to main content

Advertisement

Log in

Performance and Energy Evaluation of Different Multi-Threading Interfaces in Embedded and General Purpose Systems

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In current systems, while it is necessary to exploit the availability of multiple cores, it is also mandatory to consume less energy. To speed up the development process and make it as transparent as possible to the programmer, parallelism is exploited through the use of Application Programming Interfaces (API). However, each one of these API implements different ways to exchange data using shared memory regions, and by consequence, they have different levels of energy consumption. In this paper, considering general purpose and embedded systems, we show how each API influences the performance, energy consumption and Energy-Delay Product. For example, Pthreads consumes 12 % less energy on average than OpenMP and MPI considering all benchmarks. We also demonstrate that the difference in Energy-Delay Product (EDP) among the APIs can be of up to 81 %, while the level of efficiency (e.g.: performance or energy consumption per core) changes as the number of threads increases, depending on whether the system is embedded or general purpose.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

References

  1. Cheney, W., & Kincaid, D. (2009). Linear Algebra: Theory and Applications. Sudbury (Pp. 544–558).

  2. Korthikanti, V.A.,& Agha, G. (2010). “Towards optimizing energy costs of algorithms for shared memory architectures”. Proceedings of the 22nd ACM SPAA (pp. 157–165).

  3. Ji, J., Wang, C., Zhou, X. (2008). “System-Level early power estimation for memory subsystem in embedded systems.” Fifth IEEE International Symposium on Embedded Computing (pp. 370–375).

  4. Suleman, M.A., Qureshi, M.K., & Patt, Y.N. (2008). “Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs”. ASPLOS XIII (pp. 277–286).

  5. Chen, J., Dong, Y., Yang, X., Wang, P. (2008). “Energy-Constrained OpenMP static loop scheduling”. In High Perform. Comput. and Communications (pp. 139–146).

  6. Balladini,J., Suppi, R., Rexachs, D., Luque, E. (2011). “Impact of parallel programming models and CPUs clock frequency on energy consumption of HPC systems.” AICCSA ’11, IEEE (pp. 16–21).

  7. Berlin,K., Huan, J., Jacob, M., Rochhar, G., Prins, J., Pugh, B., Sadayappan, P., Spacco, J., Tseng, C. (2003). “Evaluating of programming language features on the performance of parallel applications on cluster architectures. “In proc. LCPC 2003 (pp. 194–208).

  8. Adve, V.S., Vernon, M.K. (1998). “A deterministic model for parallel program evaluate performance evaluation”. Techreport in Rice University and University of Wisconsin-Madison.

  9. Lee, K.M., Song, T.H., Yoon, S-H., Kwon, K-H., Jeon, J-W. (2011). “OpenMP parallel programming using dual-core embedded system,” In 11th ICCAS.

  10. Hanawa, T., Sato, M., Lee, J., Imada, T., Kimura, H., & Boku, T. (2009). Evaluation of multicore processors for embedded systems by parallel benchmark program using openmp”. Lecture Notes in Computer Science, 5568, 15–27. Springer.

    Article  Google Scholar 

  11. Chapman, B., Jost, G., Van Der Pas, R. (2008). “Using OpenMP: portable shared memory parallel programming”, The MIT Press.

  12. Rauber,T., Runger, G. (2010). “Parallel Programming - for Multicore and Cluster Systems”. [S.l.]: Springer.

  13. Butenhof, D. R. (1997). Programming with POSIX threads. Boston: Addison-Wesley Longman Publishing Co., Inc.

    Google Scholar 

  14. Tanenbaum, A.S., & Woodhul, A.S. (2009). “Operating Systems: design and implementation”, Prentice-Hall.

  15. Gropp,W. et. Al. (1998). “MPI- The complete reference”. Cambridge. MA, MIT Press.

  16. Gao,C., Gutierrez, A., Dreslinski, R.G., Mudge, T., Flautner, K., Blake, G. (2014). “A study of thread level parallelism om mobile devices”. In IEEE ISPASS. (pp. 126–127).

  17. Gardner, M. (1970). “Mathematical games – the fantastic combinations of john conway’s new solitaire game, life”, Scientific American, (pp 120–123).

  18. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P. (2007). “Numerical recipes 3rd edition: The art of scientific computing”. Cambridge University Press.

  19. Oliveira, A.B.. & Scharcanski, J. (2010). “Vehicle Couting and Trajectory Detection Based on Particle Filtering”. In XXIII SIBGRAPI.

  20. Aherne, F., Thacker, N., & Rockett, P. (1998). The bhattacharya metric as an absolute similarity measure for frequency coded data. Kybernetica, 34(4), 363–368.

    MATH  MathSciNet  Google Scholar 

  21. Blake, G., Dreslinski, R.G., Mudge, T., Flautner, K. (2010). “Evolution of thread-level parallelism in desktop applications”. In Proceedings of the 37th annual international symposium on computer architecture.

  22. Dixon, S. L., Steele, K. L., & Burton, R. P. (1996). Generation and graphical analysis of Mandelbrot and Julia Sets in more than four dimensions”. Computers and Graphics, 20, 451–456.

    Article  Google Scholar 

  23. Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1, 269–271.

    Article  MATH  MathSciNet  Google Scholar 

  24. Browne, S., Dongarra, J., Graner, N., Ho, G., & Mucci, P. (2000). A portable programming interface for performance evaluation on modern processors. International Journal High Performance Computer Applications, 14, 189–204.

    Article  Google Scholar 

  25. CACTI. Retrieved September 2013 from: http://www.cs.utah.edu/~rajeev/cacti6/.

  26. Blem, E., Menon, J., Sankaralingam, K. (2013). “A detailed Analysis of the Contemporary ARM and x86 Architectures”, UW-Madison Technical Report.

  27. Andrews, G.E., Askey, R., Roy, R. (1999). “Special Functions”, Cambridge University Press.

  28. Intel Atom Processor D2000 and N2000 Series http://www.intel.com/content/dam/doc/datasheet/atom-d2000-n2000-vol-1-datasheet.pdf.

  29. Intel Core 2 Extreme Processor Q84000 Series http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/core2-qx9000-q9000-q8000-datasheet.pdf.

  30. Foster, I.T. (1995). “Designing and Building Parallel Programs – Concepts and Tools for Parallel Software Engineering” Addison-Wesley Press.

  31. Tristam, W., Bradshaw, K. (2010). “Investigating the Performance and Code Characteristics of Three Parallel Programming Models for C++”. In SATNAC.

  32. Kuhn, B., Petersen, P., & O’toole, E. (2000). Open-MP versus Threading in C/C++. Concurrency: Practice Experimental. doi:10.1002/1096-9128(200010)12:12.

    Google Scholar 

  33. Ajkunic, E., Fatkic, H., Omerovic, E., Talic, K., Nosovic. N. (2012). “A comparison of Five Parallel Programming Models for C++”. In Proc. Of the 35th International Convention MIPRO. (pp. 1780–1784).

  34. Patel, I., Gilbert, J.R. (2008). “An Empirical Study of the Performance and Productivity of Two Parallel Programming Models”. In Proc. of the IPDPS.

  35. Wilson, G.V., Bal, H.E. (1996). “Using the Cowichan Problems to Assess the Usability of Orca”. In IEEE PDTSA.

  36. Gropp, W., Lusk, E., & Thakur, R. (1999). Using MPI-2: Advanced Features of the Message Passing Interface. Cambridge: MIT Press.37. Beck, A.C.S., Lisboa, C.A. and Carro, L. (2012). Adaptable Embedded Systems. Springer-Verlag.

  37. Beck, A.C.S., Lisboa, C.A. and Carro, L. (2012). Adaptable Embedded Systems. Springer-Verlag.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arthur Francisco Lorenzon.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lorenzon, A.F., Cera, M.C. & Schneider Beck, A.C. Performance and Energy Evaluation of Different Multi-Threading Interfaces in Embedded and General Purpose Systems. J Sign Process Syst 80, 295–307 (2015). https://doi.org/10.1007/s11265-014-0925-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-014-0925-9

Keywords

Navigation