Skip to main content

WARP : Revisiting GFN for Lightweight 128-Bit Block Cipher

  • Conference paper
  • First Online:
Selected Areas in Cryptography (SAC 2020)

Abstract

In this article, we present WARP, a lightweight 128-bit block cipher with a 128-bit key. It aims at small-footprint circuit in the field of 128-bit block ciphers, possibly for a unified encryption and decryption functionality. The overall structure of WARP is a variant of 32-nibble Type-2 Generalized Feistel Network (GFN), with a permutation over nibbles designed to optimize the security and efficiency. We conduct a thorough security analysis and report comprehensive hardware and software implementation results. Our hardware results show that WARP is the smallest 128-bit block cipher for most of typical hardware implementation strategies. A serialized circuit of WARP achieves around 800 Gate Equivalents (GEs), which is much smaller than previous state-of-the-art implementations of lightweight 128-bit ciphers (they need more than 1, 000 GEs). While our primary metric is hardware size, WARP also enjoys several other features, most notably low energy consumption. This is somewhat surprising, since GFN generally needs more rounds than substitution permutation network (SPN), and thus GFN has been considered to be less advantageous in this regard. We show a multi-round implementation of WARP is quite low-energy. Moreover, WARP also performs well on software: our SIMD implementation is quite competitive to known hardware-oriented 128-bit lightweight ciphers for long input, and even much better for small inputs due to the small number of parallel blocks. On 8-bit microcontrollers, the results of our assembly implementations show that WARP is flexible to achieve various performance characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Alternatively, we could use beyond-birthday-bound (BBB) secure modes, however they are generally more complex than the birthday-secure ones, and using complex modes may nullify the merit of using lightweight primitive.

  2. 2.

    https://csrc.nist.gov/Projects/lightweight-cryptography.

  3. 3.

    Liliput [17] is a 128-bit TBC built on a variant of GFN (EGFN [18]). It has a different linear layer structure from GFN and has 16 branches.

  4. 4.

    The name comes from the resemblance of the cipher structure to strings in a loom.

References

  1. Andreeva, E., et al.: COLM v1. a CAESAR portfolio (2016)

    Google Scholar 

  2. Avanzi, R.: The QARMA block cipher family. IACR Trans. Symmetric Cryptol. 2017(1), 4–44 (2017)

    Article  Google Scholar 

  3. Banik, S., et al.: Midori: a block cipher for low energy. In: Iwata, T., Cheon, J.H. (eds.) ASIACRYPT 2015. Part II. LNCS, vol. 9453, pp. 411–436. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48800-3_17

    Chapter  Google Scholar 

  4. Banik, S., Bogdanov, A., Luykx, A., Tischhauser, E.: SUNDAE: small universal deterministic authenticated encryption for the internet of things. IACR Trans. Symmetric Cryptol. 2018(3), 1–35 (2018)

    Google Scholar 

  5. Banik, S., et al.: SUNDAE-GIFT. A Submission to NIST Lightweight Cryptography Project (2019)

    Google Scholar 

  6. Banik, S., Bogdanov, A., Regazzoni, F.: Exploring energy efficiency of lightweight block ciphers. In: Dunkelman, O., Keliher, L. (eds.) SAC 2015. LNCS, vol. 9566, pp. 178–194. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31301-6_10

    Chapter  Google Scholar 

  7. Banik, S., Bogdanov, A., Regazzoni, F.: Atomic-AES: a compact implementation of the AES encryption/decryption core. In: Dunkelman, O., Sanadhya, S.K. (eds.) INDOCRYPT 2016. LNCS, vol. 10095, pp. 173–190. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49890-4_10

    Chapter  Google Scholar 

  8. Banik, S., et al.: GIFT-COFB. A Submission to NIST Lightweight Cryptography Project (2019)

    Google Scholar 

  9. Banik, S., et al.: Towards low energy stream ciphers. IACR Trans. Symmetric Cryptol. 2018(2), 1–19 (2018)

    Google Scholar 

  10. Banik, S., Pandey, S.K., Peyrin, T., Sasaki, Y., Sim, S.M., Todo, Y.: GIFT: a small present. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 321–345. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4_16

    Chapter  Google Scholar 

  11. Beaulieu, R., Shors, D., Smith, J., Treatman-Clark, S., Weeks, B., Wingers, L.: The SIMON and SPECK families of lightweight block ciphers. Cryptology ePrint Archive, report 2013/404 (2013). http://eprint.iacr.org/2013/404

  12. Beierle, C., Canteaut, A., Leander, G., Rotella, Y.: Proving resistance against invariant attacks: how to choose the round constants. In: Katz, J., Shacham, H. (eds.) CRYPTO 2017. LNCS, vol. 10402, pp. 647–678. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63715-0_22

    Chapter  Google Scholar 

  13. Beierle, C., et al.: The SKINNY family of block ciphers and its low-latency variant MANTIS. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016. Part II. LNCS, vol. 9815, pp. 123–153. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53008-5_5

    Chapter  Google Scholar 

  14. Beierle, C., et al.: The SKINNY family of block ciphers and its low-latency variant MANTIS. Cryptology ePrint Archive, report 2016/660 (2016). http://eprint.iacr.org/2016/660

  15. Beierle, C., Leander, G., Moradi, A., Rasoolzadeh, S.: CRAFT: lightweight tweakable block cipher with efficient protection against DFA attacks. IACR Trans. Symmetric Cryptol. 2019(1), 5–45 (2019)

    Article  Google Scholar 

  16. Benadjila, R., Guo, J., Lomné, V., Peyrin, T.: Implementing lightweight block ciphers on x86 architectures. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 324–351. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_17

    Chapter  Google Scholar 

  17. Berger, T.P., Francq, J., Minier, M., Thomas, G.: Extended generalized Feistel networks using matrix representation to propose a new lightweight block cipher: Lilliput. IEEE Trans. Comput. 65(7), 2074–2089 (2016)

    Article  MathSciNet  Google Scholar 

  18. Berger, T.P., Minier, M., Thomas, G.: Extended generalized Feistel networks using matrix representation. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 289–305. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_15

    Chapter  Google Scholar 

  19. Bhargavan, K., Leurent, G.: On the practical (in-)security of 64-bit block ciphers: collision attacks on HTTP over TLS and OpenVPN. In: Weippl, E.R., Katzenbeisser, S., Kruegel, C., Myers, A.C., Halevi, S. (eds.) ACM CCS 2016, pp. 456–467. ACM Press, October 2016

    Google Scholar 

  20. Biham, E., Biryukov, A., Shamir, A.: Cryptanalysis of Skipjack reduced to 31 rounds using impossible differentials. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 12–23. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48910-X_2

    Chapter  Google Scholar 

  21. Biham, E., Shamir, A.: Differential cryptanalysis of the full 16-round DES. In: Brickell, E.F. (ed.) CRYPTO 1992. LNCS, vol. 740, pp. 487–496. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-48071-4_34

    Chapter  Google Scholar 

  22. Bilgin, B., Nikova, S., Nikov, V., Rijmen, V., Stütz, G.: Threshold implementations of all 3 \(\times \) 3 and 4 \(\times \) 4 S-boxes. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 76–91. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33027-8_5

    Chapter  MATH  Google Scholar 

  23. Biryukov, A., Derbez, P., Perrin, L.: Differential analysis and meet-in-the-middle attack against round-reduced TWINE. In: Leander, G. (ed.) FSE 2015. LNCS, vol. 9054, pp. 3–27. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48116-5_1

    Chapter  Google Scholar 

  24. Biryukov, A., Nikolić, I.: Complementing Feistel ciphers. In: Moriai, S. (ed.) FSE 2013. LNCS, vol. 8424, pp. 3–18. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43933-3_1

    Chapter  Google Scholar 

  25. Bogdanov, A., et al.: PRESENT: an ultra-lightweight block cipher. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 450–466. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74735-2_31

    Chapter  Google Scholar 

  26. Borghoff, J., et al.: PRINCE – a low-latency block cipher for pervasive computing applications. In: Wang, X., Sako, K. (eds.) ASIACRYPT 2012. LNCS, vol. 7658, pp. 208–225. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34961-4_14

    Chapter  Google Scholar 

  27. Boura, C., Naya-Plasencia, M., Suder, V.: Scrutinizing and improving impossible differential attacks: applications to CLEFIA, Camellia, LBlock and Simon. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. LNCS, vol. 8873, pp. 179–199. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45611-8_10

    Chapter  Google Scholar 

  28. Cauchois, V., Gomez, C., Thomas, G.: General diffusion analysis: how to find optimal permutations for generalized type-II Feistel schemes. IACR Trans. Symmetric Cryptol. 2019(1), 264–301 (2019)

    Article  Google Scholar 

  29. Chakraborti, A., Iwata, T., Minematsu, K., Nandi, M.: Blockcipher-based authenticated encryption: how small can we go? In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 277–298. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4_14

    Chapter  Google Scholar 

  30. Daemen, J., Knudsen, L., Rijmen, V.: The block cipher Square. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 149–165. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0052343

    Chapter  Google Scholar 

  31. Daemen, J., Peeters, M., Van Assche, G., Rijmen, V.: Nessie proposal: NOEKEON (2000). http://gro.noekeon.org/Noekeon-spec.pdf

  32. De Cannière, C., Dunkelman, O., Knežević, M.: KATAN and KTANTAN—a family of small and efficient hardware-oriented block ciphers. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 272–288. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04138-9_20

    Chapter  MATH  Google Scholar 

  33. Derbez, P.: Note on impossible differential attacks. In: Peyrin, T. (ed.) FSE 2016. LNCS, vol. 9783, pp. 416–427. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-52993-5_21

    Chapter  Google Scholar 

  34. Derbez, P., Fouque, P.-A., Lambin, B., Mollimard, V.: Efficient search for optimal diffusion layers of generalized Feistel networks. IACR Trans. Symmetric Cryptol. 2019(1), 218–240 (2019)

    Article  Google Scholar 

  35. Dinu, D., Le Corre, Y., Khovratovich, D., Perrin, L., Großschädl, J., Biryukov, A.: Triathlon of lightweight block ciphers for the internet of things. J. Cryptogr. Eng. 9(3), 283–302 (2019). https://doi.org/10.1007/s13389-018-0193-x

    Article  Google Scholar 

  36. Grosso, V., Leurent, G., Standaert, F.-X., Varıcı, K.: LS-designs: bitslice encryption for efficient masked software implementations. In: Cid, C., Rechberger, C. (eds.) FSE 2014. LNCS, vol. 8540, pp. 18–37. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46706-0_2

    Chapter  MATH  Google Scholar 

  37. Guo, J., Peyrin, T., Poschmann, A., Robshaw, M.: The LED block cipher. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 326–341. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23951-9_22

    Chapter  Google Scholar 

  38. Gupta, K.C., Pandey, S.K., Venkateswarlu, A.: Almost involutory recursive MDS diffusion layers. Des. Codes Cryptogr. 87(2), 609–626 (2018). https://doi.org/10.1007/s10623-018-0582-2

    Article  MathSciNet  MATH  Google Scholar 

  39. Hong, D., et al.: HIGHT: a new block cipher suitable for low-resource device. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 46–59. Springer, Heidelberg (2006). https://doi.org/10.1007/11894063_4

    Chapter  Google Scholar 

  40. Standard for cryptographic protection of data on block-oriented storage devices

    Google Scholar 

  41. Gurobi Optimization Inc.: Gurobi optimizer 6.5 (2015). http://www.gurobi.com/

  42. Iwata, T., Minematsu, K., Guo, J., Morioka, S.: CLOC: authenticated encryption for short input. In: Cid, C., Rechberger, C. (eds.) FSE 2014. LNCS, vol. 8540, pp. 149–167. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46706-0_8

    Chapter  MATH  Google Scholar 

  43. Jean, J., Moradi, A., Peyrin, T., Sasdrich, P.: Bit-sliding: a generic technique for bit-serial implementations of SPN-based primitives. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 687–707. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4_33

    Chapter  MATH  Google Scholar 

  44. Knudsen, L., Leander, G., Poschmann, A., Robshaw, M.J.B.: PRINTcipher: a block cipher for IC-printing. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 16–32. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15031-9_2

    Chapter  Google Scholar 

  45. Knudsen, L., Wagner, D.: Integral cryptanalysis. In: Daemen, J., Rijmen, V. (eds.) FSE 2002. LNCS, vol. 2365, pp. 112–127. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45661-9_9

    Chapter  Google Scholar 

  46. Kölbl, S.: AVX implementation of the Skinny block cipher (2019). https://github.com/kste/skinny_avx

  47. Krovetz, T., Rogaway, P.: The software performance of authenticated-encryption modes. In: Joux, A. (ed.) FSE 2011. LNCS, vol. 6733, pp. 306–327. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21702-9_18

    Chapter  MATH  Google Scholar 

  48. Matsui, M.: Linear cryptanalysis method for DES cipher. In: Helleseth, T. (ed.) EUROCRYPT 1993. LNCS, vol. 765, pp. 386–397. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-48285-7_33

    Chapter  Google Scholar 

  49. Mikhalev, V., Armknecht, F., Müller, C.: On ciphers that continuously access the non-volatile key. IACR Trans. Symmetric Cryptol. 2016(2), 52–79 (2016). http://tosc.iacr.org/index.php/ToSC/article/view/565

  50. Moradi, A., Poschmann, A., Ling, S., Paar, C., Wang, H.: Pushing the limits: a very compact and a threshold implementation of AES. In: Paterson, K.G. (ed.) EUROCRYPT 2011. LNCS, vol. 6632, pp. 69–88. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20465-4_6

    Chapter  Google Scholar 

  51. Mouha, N., Wang, Q., Gu, D., Preneel, B.: Differential and linear cryptanalysis using mixed-integer linear programming. In: Wu, C.-K., Yung, M., Lin, D. (eds.) Inscrypt 2011. LNCS, vol. 7537, pp. 57–76. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34704-7_5

    Chapter  MATH  Google Scholar 

  52. Naito, Y., Matsui, M., Sugawara, T., Suzuki, D.: SAEB: a lightweight blockcipher-based AEAD mode of operation. IACR TCHES 2018(2), 192–217 (2018). https://tches.iacr.org/index.php/TCHES/article/view/885

  53. Nyberg, K.: Generalized Feistel networks. In: Kim, K., Matsumoto, T. (eds.) ASIACRYPT 1996. LNCS, vol. 1163, pp. 91–104. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0034838

    Chapter  Google Scholar 

  54. Poschmann, A., Moradi, A., Khoo, K., Lim, C.-W., Wang, H., Ling, S.: Side-channel resistant crypto for less than 2,300 GE. J. Cryptol. 24(2), 322–345 (2011). https://doi.org/10.1007/s00145-010-9086-6

    Article  MathSciNet  MATH  Google Scholar 

  55. Sasaki, Yu., Aoki, K.: Finding preimages in full MD5 faster than exhaustive search. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 134–152. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01001-9_8

    Chapter  Google Scholar 

  56. Sasaki, Y., Todo, Y.: New impossible differential search tool from design and cryptanalysis aspects. In: Coron, J.-S., Nielsen, J.B. (eds.) EUROCRYPT 2017. Part III. LNCS, vol. 10212, pp. 185–215. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56617-7_7

    Chapter  Google Scholar 

  57. Shibutani, K., Isobe, T., Hiwatari, H., Mitsuda, A., Akishita, T., Shirai, T.: Piccolo: an ultra-lightweight blockcipher. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 342–357. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23951-9_23

    Chapter  Google Scholar 

  58. Shirai, T., Shibutani, K., Akishita, T., Moriai, S., Iwata, T.: The 128-bit blockcipher CLEFIA (extended abstract). In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 181–195. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74619-5_12

    Chapter  Google Scholar 

  59. Sun, S., Hu, L., Wang, P., Qiao, K., Ma, X., Song, L.: Automatic security evaluation and (related-key) differential characteristic search: application to SIMON, PRESENT, LBlock, DES(L) and other bit-oriented block ciphers. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. Part I. LNCS, vol. 8873, pp. 158–178. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45611-8_9

    Chapter  Google Scholar 

  60. Suzaki, T., Minematsu, K.: Improving the generalized Feistel. In: Hong, S., Iwata, T. (eds.) FSE 2010. LNCS, vol. 6147, pp. 19–39. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13858-4_2

    Chapter  Google Scholar 

  61. Suzaki, T., Minematsu, K., Morioka, S., Kobayashi, E.: TWINE: a lightweight block cipher for multiple platforms. In: Knudsen, L.R., Wu, H. (eds.) SAC 2012. LNCS, vol. 7707, pp. 339–354. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35999-6_22

    Chapter  Google Scholar 

  62. Todo, Y.: Structural evaluation by generalized integral property. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. Part I. LNCS, vol. 9056, pp. 287–314. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46800-5_12

    Chapter  Google Scholar 

  63. Wingers, L.: SUPERCOP: SUPERCOP-20190110/crypto\_stream/simon128128ctr/avx2 (2019). https://bench.cr.yp.to/supercop/supercop-20190110.tar.xz

  64. Wu, W., Zhang, L.: LBlock: a lightweight block cipher. In: Lopez, J., Tsudik, G. (eds.) ACNS 2011. LNCS, vol. 6715, pp. 327–344. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21554-4_19

    Chapter  Google Scholar 

  65. Xiang, Z., Zhang, W., Bao, Z., Lin, D.: Applying MILP method to searching integral distinguishers based on division property for 6 lightweight block ciphers. In: Cheon, J.H., Takagi, T. (eds.) ASIACRYPT 2016. Part I. LNCS, vol. 10031, pp. 648–678. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53887-6_24

    Chapter  Google Scholar 

  66. Zheng, Y., Matsumoto, T., Imai, H.: On the construction of block ciphers provably secure and not relying on any unproved hypotheses. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 461–480. Springer, New York (1990). https://doi.org/10.1007/0-387-34805-0_42

    Chapter  Google Scholar 

Download references

Acknowledgement

Subhadeep Banik is supported by the Swiss National Science Foundation (SNSF) through the Ambizione Grant PZ00P2_179921. Zhenzhen Bao is partially supported by Nanyang Technological University in Singapore under Grant 04INS000397C230, and Singapore’s Ministry of Education under Grants RG18/19 and MOE2019-T2-1-060. Takanori Isobe is supported by Grant-in-Aid for Scientific Research (B) (KAKENHI 19H02141) for Japan Society for the Promotion of Science, Support Center for Advanced Telecommunications Technology Research (SCAT), and SECOM science and technology foundation. Kosei Sakamoto is supported by Grant-in-Aid for JSPS Fellows (KAKENHI 20J23526) for Japan Society for the Promotion of Science.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Takanori Isobe or Kazuhiko Minematsu .

Editor information

Editors and Affiliations

Appendices

A Test Vector

Table 6 shows the test vectors of WARP.

Table 6. Test vectors.

B Security Evaluation

In this section, we provide the security evaluations of WARP against differential, linear, integral, impossible differential, invariant and meet-in-the-middle attacks.

1.1 B.1 Differential/Linear Attack

Differential cryptanalsis [21] and linear cryptanalsis [48] are among the most powerful techniques available for block ciphers. To evaluate the security against differential and linear attacks, we compute the lower bound for the number of differentially and linearly active S-boxes with a MILP-aided automatic search method, which was proposed by Mouha et al. [51]. We use Gurobi [41] as the solver and search for all nibble-wise truncated differential and linear characteristics.

Table 7 shows the minimum number of differentially and linearly active S-boxes for up to 19 rounds in the single-key setting, where \(AS_D\) and \(AS_L\) denote the number of differentially and linearly active S-boxes, respectively. It can be observed from Table 7 that WARP has more than 64 active S-boxes after 19 rounds. Since the maximum differential probability and absolute linear bias of the S-box of WARP are both \(2^{-2}\) and the nibble-wise full diffusion requires 10 rounds, even with a 19-round differential distinguisher, we expect that an effective key-recovery attack cannot reach up to \(19+12=31\) rounds. In a word, the full-round WARP is secure against differential and linear attacks.

Table 7. The lower bound for the number of differentially and linearly active S-boxes in the single-key setting.

1.2 B.2 Impossible Differential Attack

Generally, an impossible differential attack [20] is one of the most powerful attacks against Feistel-type ciphers. The impossible differential attack exploits a pair of input-output difference denoted by \(\varDelta _{in}\) and \(\varDelta _{out}\) such that \(\varDelta _{in}\) will never reach \(\varDelta _{out}\) after several rounds.

As mentioned in Sect. 3, WARP achieves the full diffusion after 10 rounds at both the encryption and decryption sides in nibble-wise. Based on a more detailed investigation, we found that the full diffusion requires 12 rounds at both the encryption and decryption sides in bit-wise. Hence, there should be no probability-1 bit-wise impossible differential over 24 rounds.

In order to obtain the longest impossible differential distinguisher, we utilize an impossible differential search tool based on MILP designed by Sasaki and Todo [56]. Specifically, we evaluate the search space such that the plaintext difference and ciphertext difference activate only one bit, respectively. To model the propagation of differences through the 4-bit S-box, we take into account the differential distribution table for the 4-bit S-box. Based on the method as proposed in [59], it can be modeled with the linear inequalities.

As a result, we find the following 21-round impossible differential distinguisher.

According to Boura et al.’s work [27] in ASIACRYPT 2014 and the corresponding interpretation [33] by Derbez in FSE 2016, when extending the 21-round impossible differential distinguisher for 10 rounds, the required time complexity of the key-recovery attack is almost close to a pure exhaustive key search. Therefore, we do not expect an effective key-recovery attack on up to 32 rounds of WARP by using this 21-round impossible differential distinguisher. In a word, we expect that the full-round WARP is secure against the impossible differential attack.

1.3 B.3 Integral Attack

The integral attack was first proposed by Daemen et al. [30] and it was later formalized to the integral property by Knudsen and Wagner [45]. We define the four states for a set of \(2^n\) n-bit cell: A: if \(\forall i, j\) \(i \ne j \Leftrightarrow x_i \ne x_j\), C: if \(\forall i, j\) \(i \ne j \Leftrightarrow x_i = x_j\), B: \(\bigoplus ^{2^n -1}_i x_i \), and U: Other. The integral attack was further generalized to the division property by Todo [62], which can exploit the hidden feature between A and B states.

To evaluate the nibble-based division property, we use a MILP-aided automatic search method proposed by Xiang et al. [65], which enables us to efficiently explore the propagation of the division property. Specifically, we evaluate all the cases where 1, 2, 3 nibbles out of 32 nibbles are C and the others are all \(\mathbf{A}\) in plaintexts. Thus, we need to evaluate \(2^{23.2} \left( = \left( {\begin{array}{c}32\\ 1\end{array}}\right) + \left( {\begin{array}{c}32\\ 2\end{array}}\right) + \left( {\begin{array}{c}32\\ 3\end{array}}\right) \right) \) nibble-wise patterns.

In this way, we find the following 20-round integral distinguisher.

$$\begin{aligned} (\mathbf{AAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAA}) \nonumber \\ \xrightarrow {20~\mathrm{rounds}} (\mathbf{UU UB UB UU UU UU UU UB UU UU UU UU UU UU UB UU}) \nonumber \end{aligned}$$

However, due to the high data complexity of this integral distinguisher, one can extend only 1 round to achieve a key-recovery attack and its time complexity is almost close to an exhaustive key search. One may find an integral distinguisher by using a lower data complexity, but the number of rounds will be reduced. Considering that the nibble-wise full diffusion requires 10 rounds and that a common integral distinguisher covering larger rounds always requires a higher data complexity, we expect that the full-round WARP is secure against integral attacks.

1.4 B.4 Meet-in-the Middle Attack

We evaluate the security against the meet-in-the-middle attack following the method appeared in the self-evaluation of MIDORI [3] and CRAFT [15]. The 10-round full diffusion property guarantees that any inserted key-bit non-linearly affects all branches after the 10 rounds in the forward and the backward directions, respectively. Thus, the possible number of rounds used for the partial matching (PM) [55] is estimated as 19 \(( = (10 - 1) + (10 - 1) + 1)\). The condition for the initial structure (IS) [55] is that key differential trails in the forward direction and those in the backward direction do not share active non-linear components. For WARP, since any key differential affects all 16 S-boxes after at least 10 rounds in the forward and the backward directions, there is no such differential which shares active S-box in more than 10 rounds. Thus, the number of rounds used for IS is upper bounded by 9. Assuming that the splice-and-cut technique allows an attacker to add more 3 rounds in the worst case, at most 32-round (19 + 10 + 3) MitM attack may be feasible. However, because of the iterated key insertions of \(K_0\) and \(K_1\) for every two round, we consider that it is difficult to mount a 32-round attack on WARP.

1.5 B.5 Invariant Subspace Attack

We use LFSR-based round constants in each round. Following the notions presented in [12], we first tried to find the smallest L-invariant subspace that contains all roundkey differences. Here, L denotes the transformation that describes the linear layer in WARP. Since WARP adds two key halves in an alternating fashion, the master keys repeat every other round, the set of roundkey differences in the even/odd rounds are given by

where is the 128-bit vector defined as \((0^4\,\Vert \,RC_0^r\,\Vert \,0^4\,\Vert \,RC_1^r\,\Vert \,0^{112})\). Denoting \(D=D_{even}\cup D_{odd}\), we try to find \(W_L(D)\), which denotes the smallest L-invariant subspace containing D. We found that \(W_L(D)\) is a subspace of dimension 124, which does not automatically guarantee resistance to subspace attacks. As the invariant attack applies only if there is a non-trivial invariant g for the S-box layer such that \(W_L (D) \subset LS(g)\), where LS(g) is the subspace of all linear structures of the function g. We ran Algorithm 1 of [12] on \(Z=W_L(D)\) first to see if S(Z) hits all the cosets of Z. Experimentally we found that it does indeed, leading us to conclude that g is the constant function which guarantees security against subspace attacks.

C32-Branch Permutations with 9-Round Full Diffusion [34]

Table 8 shows four equivalent classes of 32-branch permutations of \(\pi '_{0}(x)\), \(\pi '_{1}(x)\), \(\pi '_{2}(x)\), and \(\pi '_{3}(x)\) achieving 9-round full diffusion found by Derbez et al. [34].

Table 9 is a comparison of lower bounds on the number of active S-boxes for WARP and four permutations by our MILP-based Active S-boxes counting. As shown in Table 9, the number of active S-boxes of \(\pi '_{0}(x)\), \(\pi '_{1}(x)\), \(\pi '_{2}(x)\), and \(\pi '_{3}(x)\) grows much slower than WARP. Specifically, \(\pi '_{0}(x)\), \(\pi '_{1}(x)\) and \(\pi '_{2}(x)\), \(\pi '_{3}(x)\) require at least 32 and 48 rounds for achieving AS-box of \({\ge } 64\), respectively.

Table 8. Four equivalent classes of 32-branch permutations with 9-round full diffusion [34].
Table 9. Lower bounds on the number of Active S-boxes for WARP and four permutations of \(\pi '_{0}(x)\), \(\pi '_{1}(x)\), \(\pi '_{2}(x)\), and \(\pi '_{3}(x)\)

D More Details About Hardware Implementations

1.1 D.1 Bit Serial Architecture

The nibble serial architecture can be converted to a bit serial architecture, with some simple circuit-level transformations. The first is explained in Fig. 8. Any nibblewise scan flip-flop can be serialized as shown in Fig. 8, so that only one scan flip-flop per nibble is utilized. Whereas in the nibble serial architecture, the circuit can transfer one of the 2 nibble signals in one clock cycle, the same can be done over 4 cycles in the bit-serial architecture. Thus the bit-serial circuit can perform the same set of round operations in \(48\times 4=192\) cycles, in 3 sets of 64 cycle operations as in the nibble serial circuit. We save \(18\times 3=54\) scan flip-flops in the bit-serial architecture, and also 4-bit xor gates and multiplexers can be replaced with corresponding single bit gates.

Fig. 8.
figure 8

Nibble to bit serial transformations

1.2 D.2 Unified Circuit for Encryption and Decryption

Implementing the functionalities of encryption and decryption (ED) on the same circuit can be beneficial in some instances. Various modes of operations like CBC, XTS, OCB and COLM [1], that use block ciphers as the underlying primitive, require access to both its encryption and decryption functionalities. Thus it is useful to have an implementation that achieves both functionalities of a block cipher with minimal overhead. There are several features in the Feistel network structure, that make it easier to construct the ED architecture. Some of them are as follows:

  1. 1.

    SPN structures generally require involutive S-boxes to ensure efficient ED implementation [3, 15]. If not, they require the circuits for the forward and inverse S-box to be implemented together, which increases area [7]. However the inverse round function in Feistel networks can be described with the forward S-box only. This gives us more freedom to search for S-boxes with lightweight characteristics.

  2. 2.

    Some SPN block ciphers, e.g., [15] require the decryption key to be equal to \(L\cdot K\), where L denotes the matrix that forms the linear layer of the cipher. Thus additional circuit for matrix multiplication is required. Also a multiplexer is required to filter these keys for encryption/decryption. In the architecture for WARP, this is not necessary.

  3. 3.

    Let \(F_K\) denote the function that performs S-box function and the roundkey addition on the left branch. Then by slight abuse of notation we can write the round function as

    $$\begin{aligned} Y_a= P(F_K(X_a))\oplus (X_b\lll 6), Y_b=X_a \end{aligned}$$
    (1)

    where P is the function that performs the nibblewise shuffle of the left branch by moving the i-th nibble to \(\uppi [i]\). Then it is easy to see that the inverse round function is \(X_a=Y_b, X_b=( P(F_K(Y_b))\oplus (Y_a))\ggg 6\). However we do omit the left-right swap in the last round, and as a result, decryption can be computed by iterating the following round function 40 times, followed by a “swapless” final round:

    $$\begin{aligned} X_a=( P(F_K(Y_a))\oplus (Y_b))\ggg 6, X_b=Y_a. \end{aligned}$$
    (2)

    Equations (1) and (2) are similar except that in encryption, the right branch is left rotated by 6 nibbles before addition, whereas in decryption, rotation done is after xoring left and right branches, this time by 6 nibbles towards the right. Since WARP uses each half of the master key in alternate rounds for key addition, it has been designed to have odd number of rounds. This means the first and last round encryption keys are the same, which implies that the encryption and decryption uses the left and right halves of the key in the same order.

Thus the only real overhead in the ED circuit for WARP is to accommodate left and right rotation by six nibbles in different times of the decryption cycle, and arrange for round constants to be generated in the reverse direction, which only requires some strategically placed multiplexers to accommodate the timing of these operations in the decryption cycle.

In essence, one approach would be to not rotate the right branch during cycles 0 to 15, and do rotation only during cycles 31 to 47 when the xoring of right and left branches has been completed. However, this approach is slightly problematic to adopt, as cycles 31 to 47 are used to not only swap left and right branches but also to apply \(\uppi ^{-1}\) to the left branch as it is being moved to bottom rows of flip-flops. In such a situation the bottom rows cannot accommodate two different types of permutation operations at the same time.

As a result, we need to exercise some fine-grained control over the ED circuit. For decryption, we rotate the right branch by 10 nibbles left in cycles 0–15 (same as 6 nibbles right rotation), although this rotation is not required as per Eq. (2). Thus to maintain functionality, in cycles 16–31, we drive nibbles out through register 11 to do xor between left and right branches (this was done through flip-flop 17 during encryption). After xor, the incoming nibbles are driven in through flip-flop 12 (this was done through 00 during encryption). This method has the added advantage that after round 31, the flip-flops \(17-00\) already contain \(( P(F_K(Y_a))\oplus (Y_b))\ggg 6\). This allows us to have the decryption operations in cycles 32–47 exactly the same in encryption.

For completeness, we discuss two more issues. First, for decryption we choose, cycles 0 and 1 for round constant addition, as this operation has to precede the non-linear operations. To rotate left by 10 nibbles, we need to freeze rotation for 6 cycles. Like in encryption, we divide the bottom row into groups of 6, 6, 2, 2 flip-flops and do internal rotation in these for 6 cycles. To accommodate this operation we need to replace 3 normal flip-flop nibbles with scan flip-flop nibbles.

1.3 D.3 Threshold Implementations

The s-box belongs to the cubic class \(\mathcal {C}_{266}\) as per the classification in [22] and as such it can be decomposed into 2 quadratic s-boxes \(F~\circ ~ G\), where

$$\begin{aligned} G=&~[\texttt {0, f, 6, 1, 3, 8, d, e, 4, b, 2, 5, 7, c, 9, a}],\\ F=&~[\texttt {c, 3, 1, e, 8, 5, d, 0, b, 4, 6, 9, 2, f, 7, a}] \end{aligned}$$

Since a minimum of \(d+1\) shares are required to implement the 1st order threshold implementation (TI) of a degree d s-box, we can thus implement a 3 share 1st-order TI in the manner shown in [54]. The idea is to implement the TI of G and F separated by a register bank in between, which suppresses the glitches produced by the TI of G.

Fig. 9.
figure 9

Sketch of the 3 share nibble serial architecture for WARP

Since the s-box has degree 3, a straightforward 4-share TI can also be implemented using a direct sharing approach. With regards to circuit architecture, the 4 share versions would only consist of 4 copies of the unshared circuit combined through a shared s-box. The 3-share circuit is slightly complicated owing to the fact that the shared GF functions have to be executed one after the other. We implemented it in the manner shown in Fig. 9. In the unprotected circuit described earlier, the S-box layer is computed in cycles 16 to 31. And so in the shared circuit, we implement the shared function G in cycles 15 to 30 and the shared function F in cycles 16 to 31. However this creates another problem, as the entire left branch is overwritten by the output of the shared G layer before being fed back into register 20. Since the current left branch is required to serve the role of the right branch in the subsequent round, we need to invert the G layer before we proceed to the next round. This is done by implementing a shared implementation of the quadratic s-box \(G^{-1}\) between registers 20 and 21, which is operated from cycles 17 to 32.

Table 10. Comparison of performance metrics for serial implementations synthesized with STM 90 nm Standard cell library. (RB denotes round based circuit, 3s, 4s denotes circuits with 3, 4 shares respectively) *Synthesized with the UMC 180 nm process/Power at 100 KHz. **Synthesized with the IBM 130 nm process/power at 100 KHz

Table 10 shows the performance results for the 3-share and 4-share implementations of WARP. The smallest 3-share implementation stands at 1964 GE which is smaller than known implementations of SKINNY and PRESENT (although these are computed at different level of serialization).

E More Details of Software Implementations

1.1 E.1 Details of Software Implementations on 8-Bit AVR

Our implementations of WARP on 8-bit AVR are in assembly and compiled using AVR macro assembler 2.2.7 in Atmel Studio 7.0. All implementations use the LBlock-type structure (see Fig. 4). Detailed implementation choices are: 1) One-round or two-round unrolling: combining two rounds save the shuffle operation between left and right branches, trading ROM for CPU cycles; 2) One-S-box or two-S-box: combining two S-boxes can reduce times of memory accesses, trading ROM or RAM for CPU cycles; 3) Storing the LUT of the S-box in RAM or ROM. Table 11 presents the results and comparisons.

Table 11. Software performance of WARP on 8-bit AVR.

The target device is ATmega128; The scenario is encryption/decryption of 128 bytes of data in CBC mode [35]. For ROM, that consumed by the main function for initializing data and calling the enc/dec functions are subtracted. For RAM, that required for storing the data to be processed, the master key, and the initialization vector are subtracted. WARP [1R] is for the one-round-based implementation storing the LUT of two-S-box in ROM. WARP [2R] is for the two-round-unrolled implementation storing the LUT of two-S-box in RAM. Results of other ciphers are from https://www.cryptolux.org/index.php/FELICS_Block_Ciphers_Brief_Results.

1.2 E.2 Details of Software Implementations on x64 CPUs

Similar to TWINE [61], WARP can be transformed into an equivalent form, in which only half branches go through a nibble shuffle per round. Accordingly, our SIMD implementations use this equivalent form. Besides, to take advantage of the pipelined execution unit on modern CPUs, we provide additional options on parallelism besides the double-block using 256-bit registers – compute each atomic step on quadruple/octuple data blocks. Our benchmark results of WARP, together with that of SIMON and SKINNY, are reported in Fig. 10.

Fig. 10.
figure 10

Software performance of WARP, SIMON and SKINNY on the same processor.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Banik, S. et al. (2021). WARP : Revisiting GFN for Lightweight 128-Bit Block Cipher. In: Dunkelman, O., Jacobson, Jr., M.J., O'Flynn, C. (eds) Selected Areas in Cryptography. SAC 2020. Lecture Notes in Computer Science(), vol 12804. Springer, Cham. https://doi.org/10.1007/978-3-030-81652-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-81652-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-81651-3

  • Online ISBN: 978-3-030-81652-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics