Abstract
Many concurrency platforms offer a processor oblivious model of computation, where the scheduler dynamically distributes work across threads. While this is convenient, it introduces non-determinism at runtime, which complicates debugging, because a program may have different outputs after each run. Leiserson et. al. [PPoPP ’12] persuaded Intel to modify its C/C++ compiler, which provided the Cilk Plus concurrency platform, to include a feature called pedigrees, which enables determinism by uniquely identifying strands with low overhead. They used pedigrees to design a DPRNG called DOTMIX, which hashes a pedigree, then mixes the result into a random number for a given strand. Improving the efficiency of DOTMIX by using a faster hash function is an open problem put forth by Leiserson et al. [PPoPP ’12]. We address this problem by introducing DOTMIX-Pro, which replaces the compression function used in DOTMIX with a faster universal hash function family called Square Hash due to Etzel et al. [CRYPTO ’99] and some of its variants, as well as other optimizations, which can be up to \(31\%\) faster. Additionally, we introduce a generalization of Square Hash which works with arbitrary moduli.
Similar content being viewed by others
References
Bibak K (2020) Restricted Congruences in Computing. CRC Press, USA
Bibak K, Kapron BM, Srinivasan V (2016) MMH\(^*\) with arbitrary modulus is always almost-universal. Inf Process Lett 116(7):481–483
Bibak K, Kapron BM, Srinivasan V, Tóth L (2018) On an almost-universal hash function family with applications to authentication and secrecy codes. Int J Found Comput Sci 29(03):357–375
Carter JL, Wegman MN (1979) Universal classes of hash functions. J Comput Syst Sci 18(2):143–154
Cicirello VA (2018) Impact of random number generation on parallel genetic algorithms. In The Thirty-First International Flairs Conference
Contini S, Rivest RL, Robshaw MJB, and Yin YL (1998) The security of the RC6 block cipher. https://people.csail.mit.edu/rivest/ContiniRivestRobshawYin-TheSecurityOfTheRC6BlockCipher.pdf
Cook SA, Aanderaa SO (1969) On the minimum computation time of functions. Trans Am Math Soc 142:291–314
Etzel M, Patel S, Ramzan Z (1999) Square hash: fast message authentication via optimized universal hash functions. In: Wiener Michael (ed) Advances in Cryptology - CRYPTO’ 99. Lecture Notes in Computer Science. Springer, Berlin Heidelberg, pp 234–251
Halevi S, Krawczyk H (1997) MMH: Software message authentication in the gbit/second rates. In: Biham E (ed) Fast Software Encryption. Lecture Notes in Computer Science. Springer, Berlin Heidelberg, pp 172–189
Karatsuba A, Ofman YP (1962) Multiplication of many-digital numbers by automatic computers. Dokl. Akad. Nauk SSSR 145(2):293–294
Knuth DE (1997) The Art of Computer Programming, vol. 2: Seminumerical Algorithms, 3rd ed. Addison-Wesley
Lehmer DN (1913) Certain theorems in the theory of quadratic residues. Am Math Mon 20(5):151–157
Leiserson CE, chardl TB, and Lee IA (2018) Cilk hub, https://cilk.mit.edu
Leiserson CE, Schardl TB, and Sukha J (2012) Deterministic parallel random-number generation for dynamic-multithreading platforms. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’12, pages 193–204
Oracle. Class SplittableRandom. https://docs.oracle.com/javase/8/docs/api/java/util/SplittableRandom.html
Paar C (2015) Implementation of Cryptographic Schemes. Ruhr University Bochum
Rivest RL, Robshaw MJB, Sidney R, and Yin YL (1998) The RC6 block cipher. https://people.csail.mit.edu/rivest/pubs/RRSY98.pdf
Schardl TB (2016) Performance engineering of multicore software: developing a science of fast code for the post-Moore era. PhD thesis, Massachusetts Institute of Technology
Schardl TB, Lee IA, and Leiserson CE (2018) Brief announcement: open cilk. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18, pages 351–353
Schardl TB, Moses WS, and Leiserson CE (2019) Tapir: Embedding recursive fork-join parallelism into llvm’s intermediate representation. ACM Transactions on Parallel Computing, 6(4):19:1–19:33
Schönhage A, Strassen V (1971) Schnelle Multiplikation großer Zahlen. Computing 7(3):281–292
Steele GL, Jr., Lea D, and Flood CH (2014) Fast splittable pseudorandom number generators. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA ’14, pages 453–472, New York, NY, USA
Toom AL (1963) The complexity of a scheme of functional elements realizing the multiplication of integers. Sov Math Doklady 3:714–716
Utterback R, Agrawal K, Lee IA, and Kulkarni M (2017) Processor-Oblivious record and replay. In Proceedings of the 22Nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’17, pages 145–161, New York, NY, USA, 2017. Austin, Texas, USA. https://doi.org/10.1145/3018743.3018764
Acknowledgements
The authors would like to thank the editor and the referees for carefully reading the paper and for their useful comments which helped improve the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ritchie, R., Bibak, K. DOTMIX-Pro: faster and more efficient variants of DOTMIX for dynamic-multithreading platforms. J Supercomput 78, 945–961 (2022). https://doi.org/10.1007/s11227-021-03904-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03904-3