Abstract
The numbers of magic series of large orders are computed on Intel Xeon Phi processors with an improved and optimized Montgomery multiplication algorithm. The number of magic series can be efficiently computed by Kinnaes’ formula, of which the most time-consuming element is modular multiplication. We use Montgomery multiplication for faster modular multiplication, and the number of operations is reduced through procedural simplifications. Modular addition, subtraction, and multiplication operations are vectorized by using the following instructions: Intel Advanced Vector Extensions (AVX), Intel Advanced Vector Extensions 2 (AVX2), and Intel Advanced Vector Extensions 512 (AVX-512). The number of magic series of order 8000 is computed on multiple nodes of an Intel Xeon Phi processor with a total execution time of 1806 days. Results are compared with salient studies in the literature to confirm the efficacy of the approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Biggs, N.L.: The roots of combinatorics. Historia Math. 6(2), 109–136 (1979). https://doi.org/10.1016/0315-0860(79)90074-0
Nordgren, R.P.: On properties of special magic square matrices. Linear Algebra Appl. 437(8), 2009–2025 (2012). https://doi.org/10.1016/j.laa.2012.05.031
Cammann, S.: The evolution of magic squares in China. J. Am. Orient. Soc. 80(2), 116–124 (1960). https://doi.org/10.2307/595587
Xin, G.: Constructing all magic squares of order three. Discrete Math. 308(15), 3393–3398 (2008). https://doi.org/10.1016/j.disc.2007.06.022
Beeler, M.: Appendix 5: The Order 5 Magic Squares (1973). (Privately Published)
Pinn, K., Wieczerkowski, C.: Number of magic squares from parallel tempering Monte Carlo. Int. J. Mod. Phys. C 9(4), 541–546 (1998). https://doi.org/10.1142/S0129183198000443
Trump, W.: Magic Series. http://www.trump.de/magic-squares/magic-series
Beck, M., van Herick, A.: Enumeration of \(4 \times 4\) magic squares. Math. Comput. 80, 617–621 (2011). https://doi.org/10.1090/S0025-5718-10-02347-1
Ripatti, A.: On the number of semi-magic squares of order 6 (2018). arXiv: 1807.02983
Kato, G., Minato, S.: Enumeration of associative magic squares of order 7 (2019). arXiv: 1906.07461
Libis, C., Phillips, J.D., Spall, M.: How many magic squares are there? Math. Mag. 73(1), 57–58 (2000). https://doi.org/10.1080/0025570X.2000.11996804
Kraitchik, M.: Mathematical Recreations, 2nd revised edn. Dover Publications (2006)
Bottomley, H.: Partition and composition calculator. http://www.se16.info/js/partitions.htm
Gerbicz, R.: Robert Gerbicz’s Home Page. https://sites.google.com/site/robertgerbicz
Kinnaes, D.: Calculating exact values of \(N(x, m)\) without using recurrence relations (2013). http://www.trump.de/magic-squares/magic-series/kinnaes-algorithm.pdf
Endo, K.: Private Communication (2019)
Quist, M.: Asymptotic enumeration of magic series (2013). arXiv: 1306.0616
Kinnaes, D.: Private Communication (2019)
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44, 519–521 (1985). https://doi.org/10.1090/S0025-5718-1985-0777282-X
Intel Corporation: Intel 64 and IA-32 Architectures Software Developer’s Manual. https://software.intel.com/en-us/articles/intel-sdm
Koç, Ç.K., Acar, T., Kaliski Jr., B.S.: Analyzing and comparing Montgomery multiplication algorithms. IEEE Micro 16(3), 26–33 (1996). https://doi.org/10.1109/40.502403
Bos, J.W., Montgomery, P.L., Shumow, D., Zaverucha, G.M.: Montgomery multiplication using vector instructions. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 471–489. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_24
Takahashi, D.: Computation of the \(100\) quadrillionth hexadecimal digit of \(\pi \) on a cluster of Intel Xeon Phi processors. Parallel Comput. 75, 1–10 (2018). https://doi.org/10.1016/j.parco.2018.02.002
Dussé, S.R., Kaliski Jr., B.S.: A cryptographic library for the Motorola DSP56000. In: Damgård, I.B. (ed.) EUROCRYPT 1990. LNCS, vol. 473, pp. 230–244. Springer, Heidelberg (1991). https://doi.org/10.1007/3-540-46877-3_21
Walter, C.D.: Montgomery’s multiplication technique: how to make it smaller and faster. In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 80–93. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48059-5_9
OpenMP Architecture Review Boards: OpenMP. https://www.openmp.org
Selberg, A.: An elementary proof of Dirichlet’s theorem about primes in an arithmetic progression. Ann. Math. 50(2), 297–304 (1949). https://doi.org/10.2307/1969454
Trump, W.: Private Communication (2019)
Revol, N., Rouillier, F.: Motivations for an arbitrary precision interval arithmetic and the MPFI library. Reliable Comput. 11(4), 275–290 (2005). https://doi.org/10.1007/s11155-005-6891-y
Adams, W.W., Goldstein, L.J.: Introduction to Number Theory. Prentice-Hall (1976)
Childs, L.N.: A Concrete Introduction to Higher Algebra, 3rd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-74725-5
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix Proofs of Algorithms
Appendix Proofs of Algorithms
Theorem 1
Let N be a prime number, r be a prime factor of \(N - 1\), and z be a positive integer such that \(0< z < N\). Then, \(\omega = z^{(N - 1)/r} \bmod N\) is a primitive r-th root of unity in \(\mathbb {Z}/{N}\mathbb {Z}\) if \(\omega \not \equiv 1 \pmod N\).
Proof
Let \(\mathop {\mathrm {ord}}(z)\) be an order of an integer z in \(\mathbb {Z}/{N}\mathbb {Z}\) for prime N. In other words, \(\mathop {\mathrm {ord}}(z)\) is the smallest positive integer which is greater than 0 such that \(z^{\mathop {\mathrm {ord}}(z)}\) is congruent to 1 modulo N [30, 31]. We show that the order of \(\omega \) is r if \(\omega \not \equiv 1 \pmod N\).
It holds that
[30]. Since Lagrange’s theorem states that \(\mathop {\mathrm {ord}}(z)\) divides \(N - 1\), there exists an integer s which divides \(N - 1\) and satisfies \(\mathop {\mathrm {ord}}(z) = \frac{N - 1}{s}\). Then, it holds that
![](http://media.springernature.com/lw327/springer-static/image/chp%3A10.1007%2F978-3-030-60239-0_25/MediaObjects/505680_1_En_25_Equ20_HTML.png)
Because \(\gcd (r, s)\) is equal to 1 or r since r is prime, the order of \(\omega \) is 1 or r.
Now, if the order of \(\omega \) is 1, then \(\omega = \omega ^{\mathop {\mathrm {ord}}(\omega )} \equiv 1 \pmod N\). The contraposition of this statement shows that the order of \(\omega \) is r if \(\omega \not \equiv 1 \pmod N\).
Theorem 2
Replacing the first \(e - 1\) moduli \(\beta \) with \(2 \beta \) does not change the result of Algorithm 4.
Proof
Denote variables after substitution with superscript \(\prime \). We show that \(C_e^\prime = C_e\).
If \(i = 0\), then \(t_0^\prime = a_0 B = t_0\) and hence \(q_0^\prime = \mu t_0^\prime \bmod 2\beta = \mu t_0 \bmod 2\beta \). Therefore, there are two cases where \(q_0^\prime < \beta \) and \(q_0^\prime \ge \beta \). \(q_0^\prime < \beta \), and thus \(q_0^\prime = q_0\), is the same case as Algorithm 4, so \(C_1^\prime = C_1\). As for \(q_0^\prime \ge \beta \), and thus \(q_0^\prime = q_0 + \beta \),
Assume that \(C_i^\prime \) is equal to \(C_i\) or \(C_i + N\) where \(1 \le i \le e - 2\).
If \(C_i^\prime = C_i\), then
and hence \(q_i^\prime = \mu t_i^\prime \bmod 2\beta = \mu t_i \bmod 2\beta \). Therefore,
On the other hand, if \(C_i^\prime = C_i + N\), then
and hence
Here, \(\mu t_i\) is equal to \(q_i\) or \(q_i + \beta \) modulo \(2\beta \), and \(\mu N\) is equal to \(-1\) or \((-1 + \beta )\) modulo \(2\beta \) since \(\mu = -N^{-1} \bmod \beta \). Therefore, \(q_i^\prime \) becomes
modulo \(2 \beta \). Therefore,
By mathematical induction, it holds that \(C_{i + 1}^\prime \) is equal to \(C_{i + 1}\) or \(C_{i + 1} + N\) for \(i = 1, 2, \dots , e - 2\).
If \(i = e - 1\), then \(C_{e - 1}^\prime \) is equal to \(C_{e - 1}\) or \(C_{e - 1} + N\). As for \(C_{e - 1}^\prime = C_{e - 1}\), this is the same case as Algorithm 4, so \(C_e^\prime = C_e\). As for \(C_{e - 1}^\prime = C_{e - 1} + N\),
and hence
Therefore,
assuming that \(q_{e - 1}^\prime N = q_{e - 1} N - N\) does not overflow.
From \(0 \le C_i < 2 N\), it follows that \(0 \le C_i^\prime < 3 N\). Therefore, to avoid overflow in 32-bit registers of processors, it is required that \(3 N < 2^{64}\). From the prerequisite of Algorithm 4, \(N < \beta ^e\) and hence \(N< 2^{62} = 2^{64}/4 < 2^{64}/3\) when \(e = 2\) and \(\beta = 2^{31}\). Thus, the condition is inherently satisfied.
Theorem 3
Replacing \(s_2 \leftarrow a_1 b_0 + t \bmod \beta \) by \(s_2 \leftarrow a_1 b_0 + t\), and \(s_3 \leftarrow a_1 b_1 + {\lfloor {}{t/\beta }\rfloor {}}\) by \(s_3 \leftarrow a_1 b_1\) does not change the result of Algorithm 5 when \(N < 2^{61}\).
Proof
Denote variables after substitution with superscript \(\prime \). We show that \(u^\prime = u\).
It holds that
Therefore,
Since \(t - t \bmod \beta \) eliminates the remainder of t divided by \(\beta \), it holds that \((t - t \bmod \beta ) \mu \bmod \beta = 0\), and hence
Furthermore, since
it holds that
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sugizaki, Y., Takahashi, D. (2020). Fast Computation of the Exact Number of Magic Series with an Improved Montgomery Multiplication Algorithm. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12453. Springer, Cham. https://doi.org/10.1007/978-3-030-60239-0_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-60239-0_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60238-3
Online ISBN: 978-3-030-60239-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)