Skip to main content
Log in

PI-sqrt: novel parallel implementations of in-place sequence rotation on multicore systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The huge data volumes and the emergence of new parallel architectures, e.g. multicore CPUs lead to revisiting classic computer science topics such as in-place sequence rotation. In-place sequence rotation is a basic step in several fundamental computing tasks. The sequential algorithms of the in-place sequence rotation effect are classic and well-studied, which are classified into three classes. Recently, Intel introduced the parallel standard template library (STL) implementation for multicore CPU systems; it has an in-place rotation function based on the rotation by copy, but its space complexity is \(O\left( n\right) \). In this work, we propose the blend rotation, which is a parallel-friendly and in-place algorithm that combines the merits of these three rotation algorithm classes. Besides, we propose a set of for Parallel In-place SeQuence RoTation (PI-sqrt) implementations. The performance of PI-sqrt is examined through several experiments. To the best of our knowledge, the obtained running times show that the implementations of blend and reversal rotations are by far the fastest parallel implementations; they are faster on average, through different experiments, by 7.85\(\times \) and 5.52\(\times \), respectively, compared to the parallel rotation function of Intel parallel STL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. Avmoskal: Get started with parallel stl (2018). https://software.intel.com/en-us/get-started-with-pstl

  2. Awan, M.G., Saeed, F.: Gpu-arraysort: A parallel, in-place algorithm for sorting large number of arrays. In: Parallel processing workshops (ICPPW), 2016 45th international conference on, pp. 78–87. IEEE (2016)

  3. Bahig, H.M.: A new constant-time parallel algorithm for merging. J. Supercomput. 75, 968–983 (2018)

    Article  Google Scholar 

  4. Bentley, J.: Programming pearls, 2nd edn. Addison-Wesley Professional, Boston (2000)

    MATH  Google Scholar 

  5. Berney, K., Casanova, H., Higuchi, A., Karsin, B., Sitchinava, N.: Beyond binary search: parallel in-place construction of implicit search tree layouts. In: 2018 IEEE international parallel and distributed processing symposium (IPDPS), pp. 1070–1079. IEEE (2018)

  6. Berney, K., Casanova, H., Karsin, B., Sitchinava, N.: Beyond binary search: parallel in-place construction of implicit search tree layouts. IEEE Trans. Comput. (2021). https://doi.org/10.1109/TC.2021.3075392

    Article  MATH  Google Scholar 

  7. Bornat, R.: Lecture slides on algorithms. online (1998). http://www.eis.mdx.ac.uk/staffpages/r_bornat/oldteaching/I2A/

  8. Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. Comput. Sci. Eng., IEEE 5(1), 46–55 (1998)

    Article  Google Scholar 

  9. Dalkilic, M.E., Acar, E., Tokatli, G.: A simple shuffle-based stable in-place merge algorithm. Procedia Comput. Sci. 3, 1049–1054 (2011). https://doi.org/10.1016/j.procs.2010.12.172

    Article  Google Scholar 

  10. Ellis, J., Markov, M.: In situ, stable merging by way of the perfect shuffle. Comput. J. 43(1), 40–53 (2000)

    Article  MATH  Google Scholar 

  11. Furia, C.A.: Rotation of sequences: algorithms and proofs. arXiv preprint arXiv:1406.5453 (2014)

  12. Grama, A., Kumar, V., Gupta, A., Karypis, G.: Introduction to parallel computing. Pearson Education, London (2003)

    MATH  Google Scholar 

  13. Gries, D., Mills, H.: Swapping sections. Cornell University, Tech. rep. (1981)

  14. Gu, Y., Obeya, O., Shun, J.: Parallel in-place algorithms: Theory and practice. In: Symposium on algorithmic principles of computer systems (APOCS), pp. 114–128. SIAM (2021)

  15. Henriksen, T., Serup, N.G., Elsman, M., Henglein, F., Oancea, C.E.: Futhark: purely functional GPU-programming with nested parallelism and in-place array updates. ACM SIGPLAN Notices 52(6), 556–571 (2017)

    Article  Google Scholar 

  16. Intel: intel/parallelstl. https://github.com/intel/parallelstl/blob/master/include/pstl/internal/glue_algorithm_impl.h. https://github.com/intel/parallelstl/blob/master/include/pstl/internal/glue_algorithm_impl.h. Accessed 1 Sept 2021

  17. Intel: intel/parallelstl. https://github.com/intel/parallelstl/blob/master/include/pstl/internal/algorithm_impl.h. https://github.com/intel/parallelstl/blob/master/include/pstl/internal/algorithm_impl.h. Accessed 1 Sept 2021

  18. Kang, H., Lee, J., Kim, D.: Hi-FFT: heterogeneous parallel in-place algorithm for large-scale 2D-FFT. IEEE Access 9, 120261–120273 (2021)

    Article  Google Scholar 

  19. Keller, J., Kessler, C., Träff, J.: Practical PRAM programming. WileyInterscience. Wiley, Hoboken (2001)

    Google Scholar 

  20. Knuth, D.E.: The art of computer programming, vol. 3: sorting and searching, vol. 28. Addison-Wesley, Boston (1973). https://doi.org/10.2307/2005383

    Book  MATH  Google Scholar 

  21. Lao, B., Nong, G., Chan, W.H., Xie, J.Y.: Fast in-place suffix sorting on a multicore computer. IEEE Trans. Comput. 67(12), 1737–1749 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  22. Obeya, O., Kahssay, E., Fan, E., Shun, J.: Theoretically-efficient and practical parallel in-place radix sorting. In: The 31st ACM symposium on parallelism in algorithms and architectures, pp. 213–224 (2019)

  23. Salah, A., Li, K., Li, K.: Lazy-merge: a novel implementation for indexed parallel \(k\)-way in-place merging. IEEE Trans. Parallel Distrib. Syst. 27(7), 2049–2061 (2016)

    Article  Google Scholar 

  24. Salah, A., Li, K., Liao, Q., Hashem, M., Li, Z., Chronopoulos, A.T., Zomaya, A.Y.: A time-space efficient algorithm for parallel k-way in-place merging based on sequence partitioning and perfect shuffle. ACM Trans. Parallel Comput. (TOPC) 7(2), 1–23 (2020)

    Article  Google Scholar 

  25. Snir, M., Otto, S., Huss-Lederman, S., Dongarra, J., Walker, D.: MPI–the complete reference: The MPI core, vol. 1. MIT press, Cambridge (1998)

    Google Scholar 

  26. Vajnovszki, V.: Generating involutions, derangements, and relatives by eco. Discrete Math. Theor. Comput. Sci. 12(1), 109–122 (2010)

    MathSciNet  MATH  Google Scholar 

  27. Zhou, P., Huang, J., Qin, X., Xie, C.: Pars: a popularity-aware redundancy scheme for in-memory stores. IEEE Trans. Comput. 68(4), 556–569 (2019)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61860206011 and the Program of the National Natural Science Foundation of China under Grants 61876061 and 61876164.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kenli Li.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Informed consent

M.H.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hashem, M., Li, K. & Salah, A. PI-sqrt: novel parallel implementations of in-place sequence rotation on multicore systems. Cluster Comput 26, 539–557 (2023). https://doi.org/10.1007/s10586-022-03591-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-022-03591-6

Keywords

Navigation