Abstract
Cryptographic applications like symmetric encryption algorithms can be implemented either in bit-slice or word-parallel fashion. The conversion between the two data representations corresponds to transposing a bit-matrix with variables as row vectors. In previous work we have demonstrated that combining the best of both variants, i.e. executing part of the code in bit-slice, and part of the code in word-parallel manner, can improve performance considerably, but most of the advantage is spent for the conversion. Here, we examine the conversion routine closer and deviate different levels of hardware and software support that can accelerate the conversion, ranging from existing but seldom used instructions to completely new instructions that might be implemented in future systems. We quantify the acceleration achieved by each level of support, and provide preliminary experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agosta, G., Barenghi, A., Santis, F.D., Pelosi, G.: Record setting software implementation of DES using CUDA. In: Proceedings of the 7th International Conference on Information Technology: New Generations (ITNG 2010), pp. 748–755. IEEE Computer Society (2010)
Biham, E.: A fast new DES implementation in software. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 260–272. Springer, Heidelberg (1997). doi:10.1007/BFb0052352
Eitschberger, P., Keller, J.: Optimizing parallel runtime of cryptanalytic algorithms by selecting between word-parallel and bit-serial variants of program parts. PARS-Mitteilungen 33, 22–31 (2016)
Grabher, P., Großschädl, J., Page, D.: Light-weight instruction set extensions for bit-sliced cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 331–345. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85053-3_21
Hansson, E., Kessler, C.: Global optimization of execution mode selection for the reconfigurable pram-numa multicore architecture replica. In: Proceedings of the 2nd International Symposium on Computing and Networking (CANDAR 2014), pp. 322–328. IEEE (2014)
Hansson, E., Kessler, C.: Optimized variant-selection code generation for loops on heterogeneous multicore systems. In: Proceedings of International Conference on Parallel Computing (ParCo 2015), pp. 103–112. IOS Press (2016)
Harris, D.M., Harris, S.L.: Digital Design and Computer Architecture. Morgan Kaufmann (2012)
Käsper, E., Schwabe, P.: Faster and timing-attack resistant AES-GCM. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 1–17. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04138-9_1
May, L., Penna, L., Clark, A.: An implementation of bitsliced DES on the pentium MMXTM processor. In: Dawson, E.P., Clark, A., Boyd, C. (eds.) ACISP 2000. LNCS, vol. 1841, pp. 112–122. Springer, Heidelberg (2000). doi:10.1007/10718964_10
Rebeiro, C., Selvakumar, D., Devi, A.S.L.: Bitslice implementation of AES. In: Pointcheval, D., Mu, Y., Chen, K. (eds.) CANS 2006. LNCS, vol. 4301, pp. 203–212. Springer, Heidelberg (2006). doi:10.1007/11935070_14
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Eitschberger, P., Keller, J., Holmbacka, S. (2017). Hardware and Software Support for Transposition of Bit Matrices in High-Speed Encryption. In: Yan, Z., Molva, R., Mazurczyk, W., Kantola, R. (eds) Network and System Security. NSS 2017. Lecture Notes in Computer Science(), vol 10394. Springer, Cham. https://doi.org/10.1007/978-3-319-64701-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-64701-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64700-5
Online ISBN: 978-3-319-64701-2
eBook Packages: Computer ScienceComputer Science (R0)