Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach

Savadi, Abdorreza; Deldari, Hossein

doi:10.1007/s11227-013-1018-4

Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach

Published: 18 September 2013

Volume 67, pages 565–584, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Abdorreza Savadi¹ &
Hossein Deldari¹

241 Accesses
Explore all metrics

Abstract

Computer benchmarking is a common method for measuring the parameters of a computational model. It helps to measure the parameters of any computer. With the emergence of multicore computers, the evaluation of computers was brought under consideration. Since these types of computers can be viewed and considered as parallel computers, the evaluation methods for parallel computers may be appropriate for multicore computers. However, because multicore architectures seriously focus on cache hierarchy, there is a need for new and different benchmarks to evaluate them correctly.

To this end, this paper presents a method for measuring the parameters of one of the most famous multicore computational models, namely Multi-Bulk Synchronous Parallel (Multi-BSP). This method measures the hardware latency parameters of multicore computers, namely communication latency (g _i) and synchronization latency (L _i) for all levels of the cache memory hierarchy in a bottom-up manner. By determining the parameters, the performance of algorithms on multicore architectures can be evaluated as a sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A semi-Automatic Approach for Parallel Problem Solving using the Multi-BSP Model

Article 01 December 2019

Automatic Cost Analysis for Imperative BSP Programs

Article 21 February 2018

A Hierarchical Roofline-based Benchmarking System for Detecting Performance Bottlenecks in Modern Processors

Article 01 April 2022

References

Aggarwal A, Vitter JS (1988) The input/output complexity of sorting and related problems. Commun ACM 31:1116–1127. doi:10.1145/48529.48535
Article MathSciNet Google Scholar
Alpern B, Carter L, Feig E, Selker T (1994) The uniform memory hierarchy model of computation. Algorithmica 12:72–109. doi:10.1007/BF01185206
Article MATH MathSciNet Google Scholar
Arge L, Goodrich MT, Nelson M, Sitchinava N (2008) Fundamental parallel algorithms for private-cache chip multiprocessors. In: Proceedings of the twentieth annual symposium on parallelism in algorithms and architectures, SPAA ’08. ACM, New York, pp 197–206. http://doi.acm.org/10.1145/1378533.1378573
Chapter Google Scholar
Arge L, Goodrich MT, Sitchinava N, Nelson M (2008) Fundamental parallel algorithms for privatecache chip multiprocessors. In: 20th ACM symposium on parallelism in algorithm and architectures (SPAA). ACM, New York, pp 197–206
Google Scholar
Bailey D, Barszcz E, Barton J, Browning D, Carter R, Dagum L, Fatoohi R, Frederickson P, Lasinski T, Schreiber R, Simon H, Venkatakrishnan V, Weeratunga S (1991) The nas parallel benchmarks summary and preliminary results. In: Supercomputing, 1991. Proceedings of the 1991 ACM/IEEE conference on supercomputing, pp 158–165
Chapter Google Scholar
Bisseling RH (2004) Parallel scientific computation: a structured approach using BSP and MPI. Oxford University Press, Oxford
Book Google Scholar
Blelloch GE, Chowdhury RA, Gibbons PB, Ramachandran V, Chen S, Kozuch M (2008) Provably good multicore cache performance for divide-and-conquer algorithms. In: Proceedings of the nineteenth annual ACM-SIAM symposium on discrete algorithms, SODA ’08. Society for Industrial and Applied Mathematics, Philadelphia, pp 501–510. http://portal.acm.org/citation.cfm?id=1347082.1347137
Google Scholar
Blelloch GE, Fineman JT, Gibbons PB, Simhadri HV (2011) Scheduling irregular parallel computations on hierarchical caches. In: Proceedings of the 23rd ACM symposium on parallelism in algorithms and architectures, SPAA ’11. ACM, New York, pp 355–366. http://doi.acm.org/10.1145/1989493.1989553
Chapter Google Scholar
Butenhof DR Programming with POSIX threads. Addison-Wesley
Che S, Boyer M, Meng J, Tarjan D, Sheaffer J, Lee SH, Skadron K (2009) Servet: a benchmark suite for autotuning on multicore clusters. In: Proceedings of international symposium on workload characterization, IISWC2009, pp 44–54
Chapter Google Scholar
Chowdhury R, Silvestri F, Blakeley B, Ramachandran V (2010) Oblivious algorithms for multicores and network of processors. In: IEEE international symposium on parallel distributed processing (IPDPS), pp 1–12
Google Scholar
Culler DE, Karp RM, Patterson D, Sahay A, Santos EE, Schauser KE, Subramonian R, von Eicken T (1996) Logp: a practical model of parallel computation. Commun ACM 39:78–85. http://doi.acm.org/10.1145/240455.240477
Article Google Scholar
Fortune S, Wyllie J (1978) Parallelism in random access machines. In: Proceedings of the tenth annual ACM symposium on theory of computing, STOC ’78. ACM, New York, pp 114–118. http://doi.acm.org/10.1145/800133.804339
Chapter Google Scholar
Frigo M, Leiserson CE, Prokop H, Ramachandran S (1999) Cache-oblivious algorithms. In: Annual IEEE symposium on foundations of computer science, p 285
Google Scholar
Gal-On S, Levy M (2008) Measuring multicore performance. Computer 41:99–102
Article Google Scholar
Gerbessiotis AV, Lee SY (2004) Remote memory access: a case for portable, efficient and library independent parallel programming. Sci Program 12(3):169–183. http://dl.acm.org/citation.cfm?id=1240140.1240144
Google Scholar
Gonzalez-Dominguez J, Taboada G, Fraguela B, Martin M, Tourio J (2010) Servet: a benchmark suite for autotuning on multicore clusters. In: Proceedings of international symposium on parallel and distributed processing, IPDPS2010, pp 1–9
Google Scholar
Hill JMD, Donaldson SR, Skillicorn DB (1997) Stability of communication performance in practice: From the cray t3e to networks of workstations. Technical Report PRG-TR-33-97, Programming Research Group, Oxford University Computing Laboratory
Hill JMD, Skillicorn DB (1998) Practical barrier synchronisation. In: 6th EuroMicro workshop on parallel and distributed processing (PDP’98). IEEE Computer Society Press, Los Alamitos, pp 438–444
Google Scholar
Hill JM, McColl B, Stefanescu DC, Goudreau MW, Lang K, Rao SB, Suel T, Tsantilas T, Bisseling RH (1998) Bsplib: the bsp programming library. Parallel Comput 24(14):1947–1980. http://www.sciencedirect.com/science/article/pii/S0167819198000933
Article Google Scholar
Kayi A, Yao Y, El-Ghazawi T, Newby G (2007) Experimental evaluation of emerging multi-core architectures. In: Proceedings of 21th international parallel and distributed processing symposium, IPDPS2007
Google Scholar
Linux-operating-systems. http://linux.die.net
Mellor-Crummey JM, Scott ML (1991) Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans Comput Syst 9(1):21–65. http://doi.acm.org/10.1145/103727.103729
Article Google Scholar
Ramachandran V (1997) Qsm: a general purpose shared-memory model for parallel computation. In: Ramesh S, Sivakumar G (eds) Foundations of software technology and theoretical computer science. Lecture notes in computer science, vol 1346. Springer, Berlin, pp 1–5. doi:10.1007/BFb0058018
Chapter Google Scholar
Sampson J, González R, Collard JF, Jouppi NP, Schlansker M (2005) Fast synchronization for chip multiprocessors. SIGARCH Comput Archit News 33(4):64–69. http://doi.acm.org/10.1145/1105734.1105743
Article Google Scholar
Savage J (1995) Extending the hong-kung model to memory hierarchies. In: Du DZ, Li M (eds) Computing and combinatorics. Lecture notes in computer science, vol 959. Springer, Berlin, pp 270–281. doi:10.1007/BFb0030842
Chapter Google Scholar
Savage JE, Zubair M (2008) A unified model for multicore architectures. In: Proceedings of the 1st international forum on next-generation multicore/manycore technologies, IFMT ’08. ACM, New York, pp 9:1–9:12. http://doi.acm.org/10.1145/1463768.1463780
Google Scholar
Singh JP, Weber WD, Gupta A (1992) Splash: Stanford parallel applications for shared-memory. SIGARCH Comput Archit News 20(1):5–44. http://doi.acm.org/10.1145/130823.130824
Article Google Scholar
S.P.E.C.S. benchmarks http://www.spec.org/
Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33:103–111. http://doi.acm.org/10.1145/79173.79181
Article Google Scholar
Valiant LG (2011) A bridging model for multi-core computing. J Comput Syst Sci 77(1):154–166. http://www.sciencedirect.com/science/article/B6WJ0-508X3P4-5/2/92a9dec04839a9e93887ca26e0d9b3f2, celebrating Karp’s Kyoto Prize
Article MATH MathSciNet Google Scholar
Yang TF, Lin CH, Yang CL (2010) Cache-aware task scheduling on multi-core architecture. In: 2010 international symposium on VLSI design automation and test (VLSI-DAT), pp 139–142. doi:10.1109/VDAT.2010.5496710
Chapter Google Scholar
Yotov K, Roeder T, Pingali K, Gunnels J, Gustavson F (2007) An experimental comparison of cache-oblivious and cache-conscious programs. In: Proceedings of the nineteenth annual ACM symposium on parallel algorithms and architectures, SPAA ’07. ACM, New York, pp 93–104. http://doi.acm.org/10.1145/1248377.1248394
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Dept., Ferdowsi University of Mashhad, FUM, Mashhad, Iran
Abdorreza Savadi & Hossein Deldari

Authors

Abdorreza Savadi
View author publications
You can also search for this author in PubMed Google Scholar
Hossein Deldari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdorreza Savadi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Savadi, A., Deldari, H. Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach. J Supercomput 67, 565–584 (2014). https://doi.org/10.1007/s11227-013-1018-4

Download citation

Published: 18 September 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s11227-013-1018-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A semi-Automatic Approach for Parallel Problem Solving using the Multi-BSP Model

Automatic Cost Analysis for Imperative BSP Programs

A Hierarchical Roofline-based Benchmarking System for Detecting Performance Bottlenecks in Modern Processors

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A semi-Automatic Approach for Parallel Problem Solving using the Multi-BSP Model

Automatic Cost Analysis for Imperative BSP Programs

A Hierarchical Roofline-based Benchmarking System for Detecting Performance Bottlenecks in Modern Processors

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation