Skip to main content
Log in

Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Advanced bit manipulation operations are not efficiently supported by commodity word-oriented microprocessors. Programming tricks are typically devised to shorten the long sequence of instructions needed to emulate these complicated bit operations. As these bit manipulation operations are relevant to applications that are becoming increasingly important, we propose direct support for them in microprocessors. In particular, we propose fast bit gather (or parallel extract), bit scatter (or parallel deposit) and bit permutation instructions (including group, butterfly and inverse butterfly). We show that all these instructions can be implemented efficiently using both the fast butterfly and inverse butterfly network datapaths. Specifically, we show that parallel deposit can be mapped onto a butterfly circuit and parallel extract can be mapped onto an inverse butterfly circuit. We define static, dynamic and loop invariant versions of the instructions, with static versions utilizing a much simpler functional unit. We show how a hardware decoder can be implemented for the dynamic and loop-invariant versions to generate, dynamically, the control signals for the butterfly and inverse butterfly datapaths. The simplest functional unit we propose is smaller and faster than an ALU. We also show that these instructions yield significant speedups over a basic RISC architecture for a variety of different application kernels taken from applications domains including bioinformatics, steganography, coding, compression and random number generation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 31
Figure 32
Figure 33

Similar content being viewed by others

References

  1. Warren Jr., S. (2002). Hacker’s delight. Boston: Addison-Wesley Professional (revised online: http://www.hackersdelight.org/revisions.pdf).

    Google Scholar 

  2. Schwartz, S., Kent, W. J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R. C., et al. (2003). Human–mouse alignments with BLASTZ. Genome Research, 13(1), 103–107, January.

    Article  Google Scholar 

  3. Beeler, M., Gosper, B., & Schroeppel, R. (1972). “Hackmem,” Massachusetts Institute of technology-Artificial Intelligence Laboratory Memo 239, available online: ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-239.pdf.

  4. Cray Corporation (2003). Cray Assembly Language (CAL) for Cray X1 Systems Reference Manual, version 1.2, October, available online: http://docs.cray.com/books/S-2314-51/S-2314-51-manual.pdf.

  5. Lee, R. B., & Hilewitz, Y. (2005). Fast pattern matching with parallel extract instructions. Princeton University Department of Electrical Engineering Technical Report CE-L2005-002, February.

  6. Hilewitz, Y., & Lee, R. B. (2006). Fast bit compression and expansion with parallel extract and parallel deposit instructions. Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), 65–72, September 11–13.

  7. Lee, R. B., Shi, Z., & Yang, X. (2002). How a processor can permute n bits in O(1) cycles. Proceedings of Hot Chips 14—A symposium on High Performance Chips, August.

  8. Shi, Z., Yang, X., & Lee, R. B. (2003). Arbitrary bit permutations in one or two cycles. Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), 237–247, June.

  9. Lee, R. B., Yang, X., & Shi, Z. J. (2005). Single-cycle bit permutations with MOMR execution. Journal of Computer Science and Technology, 20(5), 577–585 (September).

    Article  Google Scholar 

  10. Lee, R. B., Shi, Z., & Yang, X. (2001). Efficient permutation instructions for fast software cryptography. IEEE Micro, 21(6), 56–69 (December).

    Article  Google Scholar 

  11. Shi, Z., & Lee, R. B. (2000). Bit permutation instructions for accelerating software cryptography. Proceedings of the IEEE International Conf. on Application-Specific Systems, Architectures and Processors, 138–148, July.

  12. Lee, R. (1989). Precision architecture. IEEE Computer, 22(1), 78–91 (Jan).

    Google Scholar 

  13. Lee, R., Mahon, M., & Morris, D. (1992). Pathlength reduction features in the PA-RISC architecture. Proceedings of IEEE Compcon, 129–135. San Francisco, California, Feb 24–28.

  14. Intel Corporation (2002). Intel® Itanium® Architecture Software Developer’s Manual, 1–3, rev. 2.1, Oct.

  15. Hilewitz, Y., Shi, Z. J., & Lee, R. B. (2004). Comparing fast implementations of bit permutation instructions. Proceedings of the 38th Annual Asilomar Conference on Signals, Systems, and Computers, Nov.

  16. Beneš, V. E. (1964). Optimal rearrangeable multistage connecting networks. Bell System Technical Journal, 43(4), 1641–1656 (July).

    MATH  MathSciNet  Google Scholar 

  17. Lee, R. B., Rivest, R. L., Robshaw, M. J. B., Shi, Z. J., & Yin, Y. L. (2004). On permutation operations in Cipher design. Proceedings of the International Conference on Information Technology (ITCC), 2, 569–577 (April).

    Google Scholar 

  18. Intel Corporation (2007). IA-32 Intel® Architecture Software Developer’s Manual, 1–2.

  19. Sun Microsystems (2002). The VIS™ Instruction Set, Version 1.0, June.

  20. The Mathworks, Inc., Image Processing Toolbox User’s Guide: http://www.mathworks.com/access/helpdesk/help/toolbox/images/images.html.

  21. Franz, E., Jerichow, A., Möller, S., Pfitzmann, A., & Stierand, I. (1996). Computer based steganography. Information Hiding, Springer Lecture Notes in Computer Science, 1174, 7–21.

    Google Scholar 

  22. “Uuencode,” Wikipedia: The Free Encyclopedia, http://en.wikipedia.org/wiki/Uuencode.

  23. Cray Corporation, Man Page Collection: Bioinformatics Library Procedures, 2004, available online: http://www.cray.com/craydoc/manuals/S-2397-21/S-2397-21.pdf.

  24. National Center for Biotechnology Information, Translating Basic Local Alignment Search Tool (BLASTX), available online: http://www.ncbi.nlm.nih.gov/blast/.

  25. Fiskiran, A. M., & Lee, R. B. (2005). Fast parallel table lookups to accelerate symmetric-key cryptography. Proceedings of the International Conference on Information Technology Coding and Computing (ITCC), Embedded Cryptographic Systems Track, 526–531, April.

  26. Fiskiran, A. M., & Lee, R. B. (2005). On-chip lookup tables for fast symmetric-key encryption. Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), 356–363, July.

  27. Josephson, W., Lee, R. B., & Li, K. (2007). ISA support for fingerprinting and erasure codes. Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), July.

  28. Scholer, F., Williams, H., Yiannis, J., & Zobel, J. (2002). Compression of inverted indexes for fast query evaluation. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 222–229.

  29. Jun, B., & Kocher, P. (1999). The Intel random number generator. Technical Report, Cryptography Research Inc.

  30. McGregor, J. P., & Lee, R. B. (2001). Architectural enhancements for fast subword permutations with repetitions in cryptographic applications. Proceedings of the International Conference on Computer Design (ICCD 2001), 453–461, September.

  31. Moldovyan, N. A., Moldovyanu, P. A., & Summerville, D. H. (2007). On software implementation of fast DDP-based Ciphers. International Journal of Network Security, 4(1), 81–89 (January).

    Google Scholar 

  32. NIST, Cryptographic Hash Function Competition, http://csrc.nist.gov/groups/ST/hash/sha-3/index.html.

  33. Burger, D., & Austin, T. (1997). The SimpleScalar Tool Set, Version 2.0. University of Wisconsin-Madison Computer Sciences Department Technical Report #1342.

  34. Swartzlander, E. E., Jr. (2004). A review of large parallel counter designs. IEEE Symposium on VLSI, 89–98, February.

  35. Han, T., & Carlson, D. A. (1987). Fast area-efficient VLSI adders. Proceedings of the 8th Symposium on Computer Arithmetic, 49–55, May.

  36. Taiwan Semiconductor Manufacturing Corporation (2003). TCBN90G: TSMC 90 nm Core Library Databook, Oct.

  37. Broukhis, L. A. “BESM-6 Instruction Set,” available online: http://www.mailcom.com/besm6/instset.shtml.

  38. Hilewitz, Y., & Lee, R. B. (2007). Achieving very fast bit matrix multiplication in commodity microprocessors. Princeton University Department of Electrical Engineering Technical Report CE-L2007-4, July.

  39. IBM Corporation (2003). PowerPC Microprocessor Family: AltiVec™ Technology Programming Environments Manual, Version 2.0, July.

  40. Lee, R. (1996). Subword parallelism with MAX-2. IEEE Micro, 16(4), 51–59 (August).

    Article  Google Scholar 

  41. Lee, R. (1997). Multimedia extensions for general-purpose processors. Proceedings of the IEEE Signal Processing Systems Design and Implementation, 9–23, November.

  42. Lee, R. B. (1999). Efficiency of MicroSIMD architectures and index-mapped data for media processors. Proceedings of Media Processors 1999 IS&T/SPIE Symposium on Electric Imaging: Science and Technology, 34–46, January.

  43. Lee R. B. (2000). Subword permutation instructions for two-dimensional multimedia processing in MicroSIMD architectures. Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP 2000), 3–14, July.

  44. Hanson, C. (1996). MicroUnity’s mediaprocessor architecture. IEEE Micro, 16(4), 34–41 (August).

    Article  Google Scholar 

  45. Burke, J., McDonald, J., & Austin, T. (2000). Architectural support for fast symmetric-key cryptography. Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), November.

  46. Yang, X., Vachharajani, M., & Lee, R. B. (2000). Fast subword permutation instructions based on butterfly networks. Proceedings of Media Processors IS&T/SPIE Symposium on Electric Imaging: Science and Technology, 80–86, January.

  47. Yang, X., & Lee, R. B. (2000). Fast subword permutation instructions using omega and flip network stages. Proceedings of the International Conference on Computer Design (ICCD 2000), 15–22, September.

  48. McGregor, J. P., & Lee, R. B. (2003). Architectural techniques for accelerating subword permutations with repetitions. IEEE Transactions on Very Large Scale Integration Systems, 11(3), 325–335 (June).

    Article  Google Scholar 

  49. Moldovyan, A. A., Moldovyan, N. A., & Moldovyanu, P. A. (2007). Architecture types of the bit permutation instruction for general purpose processors. Springer LNG&G, 14, 147–159.

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Department of Defense and a research gift from Intel Corporation. Hilewitz is also supported by a Hertz Foundation Graduate Fellowship and an NSF Graduate Fellowship. The authors would also like to thank Roger Golliver of Intel Corporation for suggesting some applications that might benefit from bit manipulation instructions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruby B. Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hilewitz, Y., Lee, R.B. Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors. J Sign Process Syst Sign Image Video Technol 53, 145–169 (2008). https://doi.org/10.1007/s11265-008-0212-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-008-0212-8

Keywords

Navigation