Skip to main content
Log in

Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Computer benchmarking is a common method for measuring the parameters of a computational model. It helps to measure the parameters of any computer. With the emergence of multicore computers, the evaluation of computers was brought under consideration. Since these types of computers can be viewed and considered as parallel computers, the evaluation methods for parallel computers may be appropriate for multicore computers. However, because multicore architectures seriously focus on cache hierarchy, there is a need for new and different benchmarks to evaluate them correctly.

To this end, this paper presents a method for measuring the parameters of one of the most famous multicore computational models, namely Multi-Bulk Synchronous Parallel (Multi-BSP). This method measures the hardware latency parameters of multicore computers, namely communication latency (g i ) and synchronization latency (L i ) for all levels of the cache memory hierarchy in a bottom-up manner. By determining the parameters, the performance of algorithms on multicore architectures can be evaluated as a sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Aggarwal A, Vitter JS (1988) The input/output complexity of sorting and related problems. Commun ACM 31:1116–1127. doi:10.1145/48529.48535

    Article  MathSciNet  Google Scholar 

  2. Alpern B, Carter L, Feig E, Selker T (1994) The uniform memory hierarchy model of computation. Algorithmica 12:72–109. doi:10.1007/BF01185206

    Article  MATH  MathSciNet  Google Scholar 

  3. Arge L, Goodrich MT, Nelson M, Sitchinava N (2008) Fundamental parallel algorithms for private-cache chip multiprocessors. In: Proceedings of the twentieth annual symposium on parallelism in algorithms and architectures, SPAA ’08. ACM, New York, pp 197–206. http://doi.acm.org/10.1145/1378533.1378573

    Chapter  Google Scholar 

  4. Arge L, Goodrich MT, Sitchinava N, Nelson M (2008) Fundamental parallel algorithms for privatecache chip multiprocessors. In: 20th ACM symposium on parallelism in algorithm and architectures (SPAA). ACM, New York, pp 197–206

    Google Scholar 

  5. Bailey D, Barszcz E, Barton J, Browning D, Carter R, Dagum L, Fatoohi R, Frederickson P, Lasinski T, Schreiber R, Simon H, Venkatakrishnan V, Weeratunga S (1991) The nas parallel benchmarks summary and preliminary results. In: Supercomputing, 1991. Proceedings of the 1991 ACM/IEEE conference on supercomputing, pp 158–165

    Chapter  Google Scholar 

  6. Bisseling RH (2004) Parallel scientific computation: a structured approach using BSP and MPI. Oxford University Press, Oxford

    Book  Google Scholar 

  7. Blelloch GE, Chowdhury RA, Gibbons PB, Ramachandran V, Chen S, Kozuch M (2008) Provably good multicore cache performance for divide-and-conquer algorithms. In: Proceedings of the nineteenth annual ACM-SIAM symposium on discrete algorithms, SODA ’08. Society for Industrial and Applied Mathematics, Philadelphia, pp 501–510. http://portal.acm.org/citation.cfm?id=1347082.1347137

    Google Scholar 

  8. Blelloch GE, Fineman JT, Gibbons PB, Simhadri HV (2011) Scheduling irregular parallel computations on hierarchical caches. In: Proceedings of the 23rd ACM symposium on parallelism in algorithms and architectures, SPAA ’11. ACM, New York, pp 355–366. http://doi.acm.org/10.1145/1989493.1989553

    Chapter  Google Scholar 

  9. Butenhof DR Programming with POSIX threads. Addison-Wesley

  10. Che S, Boyer M, Meng J, Tarjan D, Sheaffer J, Lee SH, Skadron K (2009) Servet: a benchmark suite for autotuning on multicore clusters. In: Proceedings of international symposium on workload characterization, IISWC2009, pp 44–54

    Chapter  Google Scholar 

  11. Chowdhury R, Silvestri F, Blakeley B, Ramachandran V (2010) Oblivious algorithms for multicores and network of processors. In: IEEE international symposium on parallel distributed processing (IPDPS), pp 1–12

    Google Scholar 

  12. Culler DE, Karp RM, Patterson D, Sahay A, Santos EE, Schauser KE, Subramonian R, von Eicken T (1996) Logp: a practical model of parallel computation. Commun ACM 39:78–85. http://doi.acm.org/10.1145/240455.240477

    Article  Google Scholar 

  13. Fortune S, Wyllie J (1978) Parallelism in random access machines. In: Proceedings of the tenth annual ACM symposium on theory of computing, STOC ’78. ACM, New York, pp 114–118. http://doi.acm.org/10.1145/800133.804339

    Chapter  Google Scholar 

  14. Frigo M, Leiserson CE, Prokop H, Ramachandran S (1999) Cache-oblivious algorithms. In: Annual IEEE symposium on foundations of computer science, p 285

    Google Scholar 

  15. Gal-On S, Levy M (2008) Measuring multicore performance. Computer 41:99–102

    Article  Google Scholar 

  16. Gerbessiotis AV, Lee SY (2004) Remote memory access: a case for portable, efficient and library independent parallel programming. Sci Program 12(3):169–183. http://dl.acm.org/citation.cfm?id=1240140.1240144

    Google Scholar 

  17. Gonzalez-Dominguez J, Taboada G, Fraguela B, Martin M, Tourio J (2010) Servet: a benchmark suite for autotuning on multicore clusters. In: Proceedings of international symposium on parallel and distributed processing, IPDPS2010, pp 1–9

    Google Scholar 

  18. Hill JMD, Donaldson SR, Skillicorn DB (1997) Stability of communication performance in practice: From the cray t3e to networks of workstations. Technical Report PRG-TR-33-97, Programming Research Group, Oxford University Computing Laboratory

  19. Hill JMD, Skillicorn DB (1998) Practical barrier synchronisation. In: 6th EuroMicro workshop on parallel and distributed processing (PDP’98). IEEE Computer Society Press, Los Alamitos, pp 438–444

    Google Scholar 

  20. Hill JM, McColl B, Stefanescu DC, Goudreau MW, Lang K, Rao SB, Suel T, Tsantilas T, Bisseling RH (1998) Bsplib: the bsp programming library. Parallel Comput 24(14):1947–1980. http://www.sciencedirect.com/science/article/pii/S0167819198000933

    Article  Google Scholar 

  21. Kayi A, Yao Y, El-Ghazawi T, Newby G (2007) Experimental evaluation of emerging multi-core architectures. In: Proceedings of 21th international parallel and distributed processing symposium, IPDPS2007

    Google Scholar 

  22. Linux-operating-systems. http://linux.die.net

  23. Mellor-Crummey JM, Scott ML (1991) Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans Comput Syst 9(1):21–65. http://doi.acm.org/10.1145/103727.103729

    Article  Google Scholar 

  24. Ramachandran V (1997) Qsm: a general purpose shared-memory model for parallel computation. In: Ramesh S, Sivakumar G (eds) Foundations of software technology and theoretical computer science. Lecture notes in computer science, vol 1346. Springer, Berlin, pp 1–5. doi:10.1007/BFb0058018

    Chapter  Google Scholar 

  25. Sampson J, González R, Collard JF, Jouppi NP, Schlansker M (2005) Fast synchronization for chip multiprocessors. SIGARCH Comput Archit News 33(4):64–69. http://doi.acm.org/10.1145/1105734.1105743

    Article  Google Scholar 

  26. Savage J (1995) Extending the hong-kung model to memory hierarchies. In: Du DZ, Li M (eds) Computing and combinatorics. Lecture notes in computer science, vol 959. Springer, Berlin, pp 270–281. doi:10.1007/BFb0030842

    Chapter  Google Scholar 

  27. Savage JE, Zubair M (2008) A unified model for multicore architectures. In: Proceedings of the 1st international forum on next-generation multicore/manycore technologies, IFMT ’08. ACM, New York, pp 9:1–9:12. http://doi.acm.org/10.1145/1463768.1463780

    Google Scholar 

  28. Singh JP, Weber WD, Gupta A (1992) Splash: Stanford parallel applications for shared-memory. SIGARCH Comput Archit News 20(1):5–44. http://doi.acm.org/10.1145/130823.130824

    Article  Google Scholar 

  29. S.P.E.C.S. benchmarks http://www.spec.org/

  30. Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33:103–111. http://doi.acm.org/10.1145/79173.79181

    Article  Google Scholar 

  31. Valiant LG (2011) A bridging model for multi-core computing. J Comput Syst Sci 77(1):154–166. http://www.sciencedirect.com/science/article/B6WJ0-508X3P4-5/2/92a9dec04839a9e93887ca26e0d9b3f2, celebrating Karp’s Kyoto Prize

    Article  MATH  MathSciNet  Google Scholar 

  32. Yang TF, Lin CH, Yang CL (2010) Cache-aware task scheduling on multi-core architecture. In: 2010 international symposium on VLSI design automation and test (VLSI-DAT), pp 139–142. doi:10.1109/VDAT.2010.5496710

    Chapter  Google Scholar 

  33. Yotov K, Roeder T, Pingali K, Gunnels J, Gustavson F (2007) An experimental comparison of cache-oblivious and cache-conscious programs. In: Proceedings of the nineteenth annual ACM symposium on parallel algorithms and architectures, SPAA ’07. ACM, New York, pp 93–104. http://doi.acm.org/10.1145/1248377.1248394

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdorreza Savadi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Savadi, A., Deldari, H. Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach. J Supercomput 67, 565–584 (2014). https://doi.org/10.1007/s11227-013-1018-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-013-1018-4

Keywords

Navigation