Abstract
The huge data volumes and the emergence of new parallel architectures, e.g. multicore CPUs lead to revisiting classic computer science topics such as in-place sequence rotation. In-place sequence rotation is a basic step in several fundamental computing tasks. The sequential algorithms of the in-place sequence rotation effect are classic and well-studied, which are classified into three classes. Recently, Intel introduced the parallel standard template library (STL) implementation for multicore CPU systems; it has an in-place rotation function based on the rotation by copy, but its space complexity is \(O\left( n\right) \). In this work, we propose the blend rotation, which is a parallel-friendly and in-place algorithm that combines the merits of these three rotation algorithm classes. Besides, we propose a set of for Parallel In-place SeQuence RoTation (PI-sqrt) implementations. The performance of PI-sqrt is examined through several experiments. To the best of our knowledge, the obtained running times show that the implementations of blend and reversal rotations are by far the fastest parallel implementations; they are faster on average, through different experiments, by 7.85\(\times \) and 5.52\(\times \), respectively, compared to the parallel rotation function of Intel parallel STL.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
Avmoskal: Get started with parallel stl (2018). https://software.intel.com/en-us/get-started-with-pstl
Awan, M.G., Saeed, F.: Gpu-arraysort: A parallel, in-place algorithm for sorting large number of arrays. In: Parallel processing workshops (ICPPW), 2016 45th international conference on, pp. 78–87. IEEE (2016)
Bahig, H.M.: A new constant-time parallel algorithm for merging. J. Supercomput. 75, 968–983 (2018)
Bentley, J.: Programming pearls, 2nd edn. Addison-Wesley Professional, Boston (2000)
Berney, K., Casanova, H., Higuchi, A., Karsin, B., Sitchinava, N.: Beyond binary search: parallel in-place construction of implicit search tree layouts. In: 2018 IEEE international parallel and distributed processing symposium (IPDPS), pp. 1070–1079. IEEE (2018)
Berney, K., Casanova, H., Karsin, B., Sitchinava, N.: Beyond binary search: parallel in-place construction of implicit search tree layouts. IEEE Trans. Comput. (2021). https://doi.org/10.1109/TC.2021.3075392
Bornat, R.: Lecture slides on algorithms. online (1998). http://www.eis.mdx.ac.uk/staffpages/r_bornat/oldteaching/I2A/
Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. Comput. Sci. Eng., IEEE 5(1), 46–55 (1998)
Dalkilic, M.E., Acar, E., Tokatli, G.: A simple shuffle-based stable in-place merge algorithm. Procedia Comput. Sci. 3, 1049–1054 (2011). https://doi.org/10.1016/j.procs.2010.12.172
Ellis, J., Markov, M.: In situ, stable merging by way of the perfect shuffle. Comput. J. 43(1), 40–53 (2000)
Furia, C.A.: Rotation of sequences: algorithms and proofs. arXiv preprint arXiv:1406.5453 (2014)
Grama, A., Kumar, V., Gupta, A., Karypis, G.: Introduction to parallel computing. Pearson Education, London (2003)
Gries, D., Mills, H.: Swapping sections. Cornell University, Tech. rep. (1981)
Gu, Y., Obeya, O., Shun, J.: Parallel in-place algorithms: Theory and practice. In: Symposium on algorithmic principles of computer systems (APOCS), pp. 114–128. SIAM (2021)
Henriksen, T., Serup, N.G., Elsman, M., Henglein, F., Oancea, C.E.: Futhark: purely functional GPU-programming with nested parallelism and in-place array updates. ACM SIGPLAN Notices 52(6), 556–571 (2017)
Intel: intel/parallelstl. https://github.com/intel/parallelstl/blob/master/include/pstl/internal/glue_algorithm_impl.h. https://github.com/intel/parallelstl/blob/master/include/pstl/internal/glue_algorithm_impl.h. Accessed 1 Sept 2021
Intel: intel/parallelstl. https://github.com/intel/parallelstl/blob/master/include/pstl/internal/algorithm_impl.h. https://github.com/intel/parallelstl/blob/master/include/pstl/internal/algorithm_impl.h. Accessed 1 Sept 2021
Kang, H., Lee, J., Kim, D.: Hi-FFT: heterogeneous parallel in-place algorithm for large-scale 2D-FFT. IEEE Access 9, 120261–120273 (2021)
Keller, J., Kessler, C., Träff, J.: Practical PRAM programming. WileyInterscience. Wiley, Hoboken (2001)
Knuth, D.E.: The art of computer programming, vol. 3: sorting and searching, vol. 28. Addison-Wesley, Boston (1973). https://doi.org/10.2307/2005383
Lao, B., Nong, G., Chan, W.H., Xie, J.Y.: Fast in-place suffix sorting on a multicore computer. IEEE Trans. Comput. 67(12), 1737–1749 (2018)
Obeya, O., Kahssay, E., Fan, E., Shun, J.: Theoretically-efficient and practical parallel in-place radix sorting. In: The 31st ACM symposium on parallelism in algorithms and architectures, pp. 213–224 (2019)
Salah, A., Li, K., Li, K.: Lazy-merge: a novel implementation for indexed parallel \(k\)-way in-place merging. IEEE Trans. Parallel Distrib. Syst. 27(7), 2049–2061 (2016)
Salah, A., Li, K., Liao, Q., Hashem, M., Li, Z., Chronopoulos, A.T., Zomaya, A.Y.: A time-space efficient algorithm for parallel k-way in-place merging based on sequence partitioning and perfect shuffle. ACM Trans. Parallel Comput. (TOPC) 7(2), 1–23 (2020)
Snir, M., Otto, S., Huss-Lederman, S., Dongarra, J., Walker, D.: MPI–the complete reference: The MPI core, vol. 1. MIT press, Cambridge (1998)
Vajnovszki, V.: Generating involutions, derangements, and relatives by eco. Discrete Math. Theor. Comput. Sci. 12(1), 109–122 (2010)
Zhou, P., Huang, J., Qin, X., Xie, C.: Pars: a popularity-aware redundancy scheme for in-memory stores. IEEE Trans. Comput. 68(4), 556–569 (2019)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 61860206011 and the Program of the National Natural Science Foundation of China under Grants 61876061 and 61876164.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Informed consent
M.H.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hashem, M., Li, K. & Salah, A. PI-sqrt: novel parallel implementations of in-place sequence rotation on multicore systems. Cluster Comput 26, 539–557 (2023). https://doi.org/10.1007/s10586-022-03591-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-022-03591-6