Fast Computation of the Exact Number of Magic Series with an Improved Montgomery Multiplication Algorithm

Sugizaki, Yukimasa; Takahashi, Daisuke

doi:10.1007/978-3-030-60239-0_25

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12453))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

2032 Accesses

Abstract

The numbers of magic series of large orders are computed on Intel Xeon Phi processors with an improved and optimized Montgomery multiplication algorithm. The number of magic series can be efficiently computed by Kinnaes’ formula, of which the most time-consuming element is modular multiplication. We use Montgomery multiplication for faster modular multiplication, and the number of operations is reduced through procedural simplifications. Modular addition, subtraction, and multiplication operations are vectorized by using the following instructions: Intel Advanced Vector Extensions (AVX), Intel Advanced Vector Extensions 2 (AVX2), and Intel Advanced Vector Extensions 512 (AVX-512). The number of magic series of order 8000 is computed on multiple nodes of an Intel Xeon Phi processor with a total execution time of 1806 days. Results are compared with salient studies in the literature to confirm the efficacy of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multiplication Algorithm for Multivariate Trigonometric Series

Large Number Multiplication by Repeated Addition

An extra-component method for evaluating fast matrix-vector multiplication with special functions

Article 10 August 2022

References

Biggs, N.L.: The roots of combinatorics. Historia Math. 6(2), 109–136 (1979). https://doi.org/10.1016/0315-0860(79)90074-0
Article MathSciNet MATH Google Scholar
Nordgren, R.P.: On properties of special magic square matrices. Linear Algebra Appl. 437(8), 2009–2025 (2012). https://doi.org/10.1016/j.laa.2012.05.031
Article MathSciNet MATH Google Scholar
Cammann, S.: The evolution of magic squares in China. J. Am. Orient. Soc. 80(2), 116–124 (1960). https://doi.org/10.2307/595587
Article MATH Google Scholar
Xin, G.: Constructing all magic squares of order three. Discrete Math. 308(15), 3393–3398 (2008). https://doi.org/10.1016/j.disc.2007.06.022
Article MathSciNet MATH Google Scholar
Beeler, M.: Appendix 5: The Order 5 Magic Squares (1973). (Privately Published)
Google Scholar
Pinn, K., Wieczerkowski, C.: Number of magic squares from parallel tempering Monte Carlo. Int. J. Mod. Phys. C 9(4), 541–546 (1998). https://doi.org/10.1142/S0129183198000443
Article Google Scholar
Trump, W.: Magic Series. http://www.trump.de/magic-squares/magic-series
Beck, M., van Herick, A.: Enumeration of $4 \times 4$ magic squares. Math. Comput. 80, 617–621 (2011). https://doi.org/10.1090/S0025-5718-10-02347-1
Article MathSciNet MATH Google Scholar
Ripatti, A.: On the number of semi-magic squares of order 6 (2018). arXiv: 1807.02983
Kato, G., Minato, S.: Enumeration of associative magic squares of order 7 (2019). arXiv: 1906.07461
Libis, C., Phillips, J.D., Spall, M.: How many magic squares are there? Math. Mag. 73(1), 57–58 (2000). https://doi.org/10.1080/0025570X.2000.11996804
Article MathSciNet MATH Google Scholar
Kraitchik, M.: Mathematical Recreations, 2nd revised edn. Dover Publications (2006)
Google Scholar
Bottomley, H.: Partition and composition calculator. http://www.se16.info/js/partitions.htm
Gerbicz, R.: Robert Gerbicz’s Home Page. https://sites.google.com/site/robertgerbicz
Kinnaes, D.: Calculating exact values of $N(x, m)$ without using recurrence relations (2013). http://www.trump.de/magic-squares/magic-series/kinnaes-algorithm.pdf
Endo, K.: Private Communication (2019)
Google Scholar
Quist, M.: Asymptotic enumeration of magic series (2013). arXiv: 1306.0616
Kinnaes, D.: Private Communication (2019)
Google Scholar
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44, 519–521 (1985). https://doi.org/10.1090/S0025-5718-1985-0777282-X
Article MathSciNet MATH Google Scholar
Intel Corporation: Intel 64 and IA-32 Architectures Software Developer’s Manual. https://software.intel.com/en-us/articles/intel-sdm
Koç, Ç.K., Acar, T., Kaliski Jr., B.S.: Analyzing and comparing Montgomery multiplication algorithms. IEEE Micro 16(3), 26–33 (1996). https://doi.org/10.1109/40.502403
Article Google Scholar
Bos, J.W., Montgomery, P.L., Shumow, D., Zaverucha, G.M.: Montgomery multiplication using vector instructions. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 471–489. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_24
Chapter Google Scholar
Takahashi, D.: Computation of the $100$ quadrillionth hexadecimal digit of $\pi $ on a cluster of Intel Xeon Phi processors. Parallel Comput. 75, 1–10 (2018). https://doi.org/10.1016/j.parco.2018.02.002
Article MathSciNet Google Scholar
Dussé, S.R., Kaliski Jr., B.S.: A cryptographic library for the Motorola DSP56000. In: Damgård, I.B. (ed.) EUROCRYPT 1990. LNCS, vol. 473, pp. 230–244. Springer, Heidelberg (1991). https://doi.org/10.1007/3-540-46877-3_21
Walter, C.D.: Montgomery’s multiplication technique: how to make it smaller and faster. In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 80–93. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48059-5_9
Chapter Google Scholar
OpenMP Architecture Review Boards: OpenMP. https://www.openmp.org
Selberg, A.: An elementary proof of Dirichlet’s theorem about primes in an arithmetic progression. Ann. Math. 50(2), 297–304 (1949). https://doi.org/10.2307/1969454
Article MathSciNet MATH Google Scholar
Trump, W.: Private Communication (2019)
Google Scholar
Revol, N., Rouillier, F.: Motivations for an arbitrary precision interval arithmetic and the MPFI library. Reliable Comput. 11(4), 275–290 (2005). https://doi.org/10.1007/s11155-005-6891-y
Article MathSciNet MATH Google Scholar
Adams, W.W., Goldstein, L.J.: Introduction to Number Theory. Prentice-Hall (1976)
Google Scholar
Childs, L.N.: A Concrete Introduction to Higher Algebra, 3rd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-74725-5

Download references

Author information

Authors and Affiliations

Graduate School of Science and Technology, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8573, Japan
Yukimasa Sugizaki
Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
Daisuke Takahashi

Authors

Yukimasa Sugizaki
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Takahashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yukimasa Sugizaki .

Editor information

Editors and Affiliations

Columbia University, New York, NY, USA
Meikang Qiu

Appendix Proofs of Algorithms

Theorem 1

Let N be a prime number, r be a prime factor of $N - 1$, and z be a positive integer such that $0< z < N$. Then, $\omega = z^{(N - 1)/r} \bmod N$ is a primitive r-th root of unity in $\mathbb {Z}/{N}\mathbb {Z}$ if $\omega \not \equiv 1 \pmod N$.

Proof

Let $\mathop {\mathrm {ord}}(z)$ be an order of an integer z in $\mathbb {Z}/{N}\mathbb {Z}$ for prime N. In other words, $\mathop {\mathrm {ord}}(z)$ is the smallest positive integer which is greater than 0 such that $z^{\mathop {\mathrm {ord}}(z)}$ is congruent to 1 modulo N [30, 31]. We show that the order of $\omega $ is r if $\omega \not \equiv 1 \pmod N$.

It holds that [30]. Since Lagrange’s theorem states that $\mathop {\mathrm {ord}}(z)$ divides $N - 1$, there exists an integer s which divides $N - 1$ and satisfies $\mathop {\mathrm {ord}}(z) = \frac{N - 1}{s}$. Then, it holds that

Because $\gcd (r, s)$ is equal to 1 or r since r is prime, the order of $\omega $ is 1 or r.

Now, if the order of $\omega $ is 1, then $\omega = \omega ^{\mathop {\mathrm {ord}}(\omega )} \equiv 1 \pmod N$. The contraposition of this statement shows that the order of $\omega $ is r if $\omega \not \equiv 1 \pmod N$.

Theorem 2

Replacing the first $e - 1$ moduli $\beta $ with $2 \beta $ does not change the result of Algorithm 4.

Proof

Denote variables after substitution with superscript $\prime $. We show that $C_e^\prime = C_e$.

If $i = 0$, then $t_0^\prime = a_0 B = t_0$ and hence $q_0^\prime = \mu t_0^\prime \bmod 2\beta = \mu t_0 \bmod 2\beta $. Therefore, there are two cases where $q_0^\prime < \beta $ and $q_0^\prime \ge \beta $. $q_0^\prime < \beta $, and thus $q_0^\prime = q_0$, is the same case as Algorithm 4, so $C_1^\prime = C_1$. As for $q_0^\prime \ge \beta $, and thus $q_0^\prime = q_0 + \beta $,

$$\begin{aligned} C_1^\prime&= (t_0^\prime + q_0^\prime N)/\beta \\&= (t_0 + q_0 N + \beta N)/\beta \\&= C_1 + N. \end{aligned}$$

Assume that $C_i^\prime $ is equal to $C_i$ or $C_i + N$ where $1 \le i \le e - 2$.

If $C_i^\prime = C_i$, then

$$\begin{aligned} t_i^\prime&= C_i^\prime + a_i B \\&= C_i + a_i B \\&= t_i \end{aligned}$$

and hence $q_i^\prime = \mu t_i^\prime \bmod 2\beta = \mu t_i \bmod 2\beta $. Therefore,

$$\begin{aligned} C_{i + 1}^\prime&= (t_i^\prime + q_i^\prime N)/\beta \\&= {\left\{ \begin{array}{ll} (t_i + q_i N)/\beta = C_{i + 1} &{} \text {if }q_i^\prime < \beta \text { and thus }q_i^\prime = q_i \\ (t_i + q_i N + \beta N)/\beta = C_{i + 1} + N &{} \text {if }q_i^\prime \ge \beta \text { and thus }q_i^\prime = q_i + \beta \end{array}\right. } \end{aligned}$$

On the other hand, if $C_i^\prime = C_i + N$, then

$$\begin{aligned} t_i^\prime&= C_i^\prime + a_i B \\&= C_i + N + a_i B \\&= t_i + N \end{aligned}$$

and hence

$$\begin{aligned} q_i^\prime&= \mu t_i^\prime \bmod 2\beta \\&= \mu (t_i + N) \bmod 2\beta \\&= \{ (\mu t_i \bmod 2\beta ) + (\mu N \bmod 2\beta ) \} \bmod 2\beta . \end{aligned}$$

Here, $\mu t_i$ is equal to $q_i$ or $q_i + \beta $ modulo $2\beta $, and $\mu N$ is equal to $-1$ or $(-1 + \beta )$ modulo $2\beta $ since $\mu = -N^{-1} \bmod \beta $. Therefore, $q_i^\prime $ becomes

$$\begin{aligned} q_i^\prime = {\left\{ \begin{array}{ll} q_i - 1 &{} \text {if }\mu t_i = q_i \bmod 2 \beta , \; \mu N = -1 \bmod 2 \beta \\ q_i - 1 + \beta &{} \text {if }\mu t_i = q_i \bmod 2 \beta , \; \mu N = (-1 + \beta ) \bmod 2 \beta \\ q_i + \beta - 1 = q_i - 1 + \beta &{} \text {if }\mu t_i = (q_i + \beta ) \bmod 2 \beta , \; \mu N = -1 \bmod 2 \beta \\ q_i + \beta - 1 + \beta = q_i - 1 &{} \text {if }\mu t_i = (q_i + \beta ) \bmod 2\beta , \; \mu N = (-1 + \beta ) \bmod 2\beta \\ \end{array}\right. } \end{aligned}$$

modulo $2 \beta $. Therefore,

$$\begin{aligned} C_{i + 1}^\prime&= (t_i^\prime + q_i^\prime N)/\beta \\&= {\left\{ \begin{array}{ll} (t_i + N + q_i N - N)/\beta = C_{i + 1} &{} \text {if }q_i^\prime = q_i - 1 \\ (t_i + N + q_i N - N + \beta N)/\beta = C_{i + 1} + N &{} \text {if }q_i^\prime = q_i - 1 + \beta \\ \end{array}\right. } \end{aligned}$$

By mathematical induction, it holds that $C_{i + 1}^\prime $ is equal to $C_{i + 1}$ or $C_{i + 1} + N$ for $i = 1, 2, \dots , e - 2$.

If $i = e - 1$, then $C_{e - 1}^\prime $ is equal to $C_{e - 1}$ or $C_{e - 1} + N$. As for $C_{e - 1}^\prime = C_{e - 1}$, this is the same case as Algorithm 4, so $C_e^\prime = C_e$. As for $C_{e - 1}^\prime = C_{e - 1} + N$,

$$\begin{aligned} t_{e - 1}^\prime&= C_{e - 1}^\prime + a_{e - 1} B \\&= C_{e - 1} + N + a_{e - 1} B \\&= t_{e - 1} + N \end{aligned}$$

and hence

$$\begin{aligned} q_{e - 1}^\prime&= \mu t_{e - 1}^\prime \bmod \beta \\&= (\mu t_{e - 1} + \mu N) \bmod \beta \\&= (q_{e - 1} - 1) \bmod \beta . \end{aligned}$$

Therefore,

$$\begin{aligned} C_e^\prime&= (t_{e - 1}^\prime + q_{e - 1}^\prime N)/\beta \\&= (t_{e - 1} + N + q_{e - 1} N - N)/\beta \\&= (t_{e - 1} + q_{e - 1} N)/\beta \\&= C_e, \end{aligned}$$

assuming that $q_{e - 1}^\prime N = q_{e - 1} N - N$ does not overflow.

From $0 \le C_i < 2 N$, it follows that $0 \le C_i^\prime < 3 N$. Therefore, to avoid overflow in 32-bit registers of processors, it is required that $3 N < 2^{64}$. From the prerequisite of Algorithm 4, $N < \beta ^e$ and hence $N< 2^{62} = 2^{64}/4 < 2^{64}/3$ when $e = 2$ and $\beta = 2^{31}$. Thus, the condition is inherently satisfied.

Theorem 3

Replacing $s_2 \leftarrow a_1 b_0 + t \bmod \beta $ by $s_2 \leftarrow a_1 b_0 + t$, and $s_3 \leftarrow a_1 b_1 + {\lfloor {}{t/\beta }\rfloor {}}$ by $s_3 \leftarrow a_1 b_1$ does not change the result of Algorithm 5 when $N < 2^{61}$.

Proof

Denote variables after substitution with superscript $\prime $. We show that $u^\prime = u$.

It holds that

$$\begin{aligned} s_2^\prime&= a_1 b_0 + t \\&= (a_1 b_0 + t \bmod \beta ) - t \bmod \beta + t \\&= s_2 + t - t \bmod \beta . \end{aligned}$$

Therefore,

$$\begin{aligned} r^\prime&= s_2^\prime \mu \bmod \beta \\&= (s_2 + t - t \bmod \beta ) \mu \bmod \beta \\&= \{ s_2 \mu \bmod \beta + (t - t \bmod \beta ) \mu \bmod \beta \} \bmod \beta . \end{aligned}$$

Since $t - t \bmod \beta $ eliminates the remainder of t divided by $\beta $, it holds that $(t - t \bmod \beta ) \mu \bmod \beta = 0$, and hence

$$\begin{aligned} r^\prime&= (s_2 \mu \bmod \beta + 0) \bmod \beta \\&= s_2 \mu \bmod \beta \\&= r. \end{aligned}$$

Furthermore, since

$$\begin{aligned} s_3^\prime&= a_1 b_1 \\&= (a_1 b_1 + {\lfloor {}{t/\beta }\rfloor {}}) - {\lfloor {}{t/\beta }\rfloor {}} \\&= s_3 - {\lfloor {}{t/\beta }\rfloor {}}, \end{aligned}$$

it holds that

$$\begin{aligned} u^\prime&= (r^\prime N_0 + s_2^\prime )/\beta + r^\prime N_1 + s_3^\prime \\&= (r N_0 + s_2 + t - t \bmod \beta )/\beta + r N_1 + s_3 - {\lfloor {}{t/\beta }\rfloor {}} \\&= (r N_0 + s_2 + t - t \bmod \beta )/\beta + r N_1 + s_3 - (t - t \bmod \beta )/\beta \\&= (r N_0 + s_2)/\beta + r N_1 + s_3 \\&= u. \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sugizaki, Y., Takahashi, D. (2020). Fast Computation of the Exact Number of Magic Series with an Improved Montgomery Multiplication Algorithm. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12453. Springer, Cham. https://doi.org/10.1007/978-3-030-60239-0_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-60239-0_25
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60238-3
Online ISBN: 978-3-030-60239-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Fast Computation of the Exact Number of Magic Series with an Improved Montgomery Multiplication Algorithm

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multiplication Algorithm for Multivariate Trigonometric Series

Large Number Multiplication by Repeated Addition

An extra-component method for evaluating fast matrix-vector multiplication with special functions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix Proofs of Algorithms

Theorem 1

Proof

Theorem 2

Proof

Theorem 3

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Fast Computation of the Exact Number of Magic Series with an Improved Montgomery Multiplication Algorithm

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multiplication Algorithm for Multivariate Trigonometric Series

Large Number Multiplication by Repeated Addition

An extra-component method for evaluating fast matrix-vector multiplication with special functions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix Proofs of Algorithms

Appendix Proofs of Algorithms

Theorem 1

Proof

Theorem 2

Proof

Theorem 3

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation