Skip to main content

ARM/NEON Co-design of Multiplication/Squaring

  • Conference paper
  • First Online:
Book cover Information Security Applications (WISA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10763))

Included in the following conference series:

  • 1037 Accesses

Abstract

Many modern mobile processors support new SIMD extensions (e.g. NEON engine) and previous applications (e.g. image processing, cryptography) written in SISD are accelerated by re-writing the previous implementations in SIMD instruction sets. Particularly, integer multiplication and squaring operations are the most expensive in Public Key Cryptography (PKC). Many works have been conducted to reduce the execution timing in NEON instruction set. However, ARM–NEON processor also supports powerful ARM instruction set as well. By exploiting the ARM instruction together with NEON engine, we can achieve further improved performance. After this observation, we introduce new parallel approach for integer multiplication and squaring operations on ARM–NEON processors. Unlike previous implementations, we mix-use both ARM and NEON instructions to hide computation latency for ARM into NEON. Since ARM and NEON modules are separated units, the assignments are successfully issued independently. The integer multiplication and squaring are finely divided into several sub-tasks and the sub-tasks are properly assigned to ARM and NEON in order to balance the workloads. Finally, the proposed implementations outperform the best-known results on the identical ARM–NEON processors by 22.4% and 18.3% for 2048-bit integer multiplication and squaring, respectively.

This work was supported by the Energy Efficiency & Resources Core Technology Program of the Korea Institute of Energy Technology Evaluation and Planning (KETEP), granted financial resource from the Ministry of Trade, Industry & Energy, Republic of Korea. (No. 20152000000170). Hwajeong Seo was supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2014-0-00743) supervised by the IITP (Institute for Information & communications Technology Promotion). Zhi Hu was partially supported by the Natural Science Foundation of China (Grant No. 61602526).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    When m-bit multiplication is required, one m-bit multiplication is evenly divided into 4 m/2-bit multiplication operations. Among them 3 m/2-bit multiplication is performed in COS method on NEON engine. On ARM processor, m/2-bit multiplication is performed in hybrid-scanning method (width: 64-bit).

  2. 2.

    \(\mathtt{umlal\; a,b,c,d:}\; \{\mathtt{b,a}\} \leftarrow \{\mathtt{b,a}\}+ \mathtt{c} \times \mathtt{d}\).

  3. 3.

    If we define multi-core processing through OpenMP library and execute multiple threads, the performance is enhanced by the number of threads.

References

  1. Bernstein, D.J., Schwabe, P.: NEON crypto. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 320–339. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33027-8_19

    Chapter  Google Scholar 

  2. Bos, J.W., Montgomery, P.L., Shumow, D., Zaverucha, G.M.: Montgomery multiplication using vector instructions. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 471–489. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_24

    Chapter  Google Scholar 

  3. Koziel, B., Jalali, A., Azarderakhsh, R., Jao, D., Mozaffari-Kermani, M.: NEON-SIDH: efficient implementation of supersingular isogeny Diffie-Hellman Key exchange protocol on ARM. In: Foresti, S., Persiano, G. (eds.) CANS 2016. LNCS, vol. 10052, pp. 88–103. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48965-0_6

    Chapter  Google Scholar 

  4. Martins, P., Sousa, L.: On the evaluation of multi-core systems with SIMD engines for public-key cryptography. In: 2014 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW), pp. 48–53. IEEE (2014)

    Google Scholar 

  5. Martins, P., Sousa, L.: Stretching the limits of programmable embedded devices for public-key cryptography. In: Proceedings of the Second Workshop on Cryptography and Security in Computing Systems, pp. 19–24. ACM (2015)

    Google Scholar 

  6. Pabbuleti, K.C., Mane, D.H., Desai, A., Albert, C., Schaumont, P.: SIMD acceleration of modular arithmetic on contemporary embedded platforms. In: 2013 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2013)

    Google Scholar 

  7. Seo, H., Liu, Z., Choi, J., Kim, H.: Multi-precision squaring for public-key cryptography on embedded microprocessors. In: Paul, G., Vaudenay, S. (eds.) INDOCRYPT 2013. LNCS, vol. 8250, pp. 227–243. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03515-4_15

    Chapter  Google Scholar 

  8. Seo, H., Liu, Z., Großschädl, J., Choi, J., Kim, H.: Montgomery modular multiplication on ARM-NEON revisited. In: Lee, J., Kim, J. (eds.) ICISC 2014. LNCS, vol. 8949, pp. 328–342. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15943-0_20

    Chapter  Google Scholar 

  9. Seo, H., Liu, Z., Großschädl, J., Kim, H.: Efficient arithmetic on ARM-NEON and its application for high-speed RSA implementation. IACR Cryptology ePrint Archive 2015:465 (2015)

    Google Scholar 

  10. Seo, H., Liu, Z., Nogami, Y., Park, T., Choi, J., Zhou, L., Kim, H.: Faster ECC over \(\mathbb{F}_{2^{521}-1}\) (feat. NEON). In: Kwon, S., Yun, A. (eds.) ICISC 2015. LNCS, vol. 9558, pp. 169–181. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30840-1_11

    Chapter  Google Scholar 

  11. Seo, H., et al.: Parallel implementations of LEA, revisited. In: Choi, D., Guilley, S. (eds.) WISA 2016. LNCS, vol. 10144, pp. 318–330. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56549-1_27

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Howon Kim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Seo, H., Park, T., Ji, J., Hu, Z., Kim, H. (2018). ARM/NEON Co-design of Multiplication/Squaring. In: Kang, B., Kim, T. (eds) Information Security Applications. WISA 2017. Lecture Notes in Computer Science(), vol 10763. Springer, Cham. https://doi.org/10.1007/978-3-319-93563-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93563-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93562-1

  • Online ISBN: 978-3-319-93563-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics