Abstract
This paper presents an optimization method to build the smallest possible integer mapping unit that can replace a conventional multiply-and-accumulate unit in deep learning applications. The unit is built using a hardware-software co-design strategy that minimizes the set of represented real values and energy consumed. We target larger and more complex deep learning applications domains than those explored in previous related works, namely generative models for image and text content. Our key result is that using our proposed method, we can produce a set as small as 4 entries for an image enhancement application, and 16–32 entries for the GPT2 model, all with minimal loss of quality. Experimental results show that a hardware accelerator designed using our approach can reduce the processing time up to \(1.98\times \)/\(3.62\times \) and reduce computation energy consumed up to \(1.7\times \)/\(8.4\times \) compared to 8-bit integer/16-bit floating-point alternatives, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
ESRGAN can compare its output against the original images. For Set5, the model achieves a PSNR of 30.8/28/29/30.3 dB on FP32/Bedot/Bedot+H/INT8. The reduction in quality has the same trend when we compare against FP32. Thus we also use FP32 images in Table 2 for a consistent comparison.
References
Lin, Y., Li, Y., Liu, T., Xiao, T., Liu, T., Zhu, J.: Towards fully 8-bit integer inference for the transformer model. arXiv preprint arXiv:2009.08034 (2020)
Wang, P., et al.: QGAN: quantized generative adversarial networks. arXiv preprint arXiv:1901.08263 (2019)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. Adv. Neural. Inf. Process. Syst. 29, 2234–2242 (2016)
Wu, H., Judd, P., Zhang, X., Isaev, M., Micikevicius, P.: Integer quantization for deep learning inference: principles and empirical evaluation. arXiv preprint arXiv:2004.09602 (2020)
Kim, S., Kum, K.-I., Sung, W.: Fixed-point optimization utility for C and C++ based digital signal processing programs. IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process. 45(11), 1455–1464 (1998)
Kum, K.-I., Kang, J., Sung, W.: AUTOSCALER for C: an optimizing floating-point to integer C program converter for fixed-point digital signal processors. IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process. 47(9), 840–848 (2000)
Cong, J., Liu, B., Neuendorffer, S., Noguera, J., Vissers, K., Zhang, Z.: High-level synthesis for FPGAs: from prototyping to deployment. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 30(4), 473–491 (2011)
Ho, N.-M., Wong, W.-F.: Exploiting half precision arithmetic in Nvidia GPUs. In: 2017 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7. IEEE (2017)
Higham, N.J., Mary, T.: Mixed precision algorithms in numerical linear algebra. Acta Numer. 31, 347–414 (2022)
Ho, N.-M., Manogaran, E., Wong, W.-F., Anoosheh, A.: Efficient floating point precision tuning for approximate computing. In: 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 63–68. IEEE (2017)
De Silva, H., Santosa, A.E., Ho, N.-M., Wong, W.-F.: Approxsymate: path sensitive program approximation using symbolic execution. In: Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, pp. 148–162 (2019)
Gustafson, J.L., Yonemoto, I.T.: Beating floating point at its own game: posit arithmetic. Supercomput. Front. Innov. 4(2), 71–86 (2017)
Ciocirlan, S.D., Loghin, D., Ramapantulu, L., Ţăpuş, N., Teo, Y.M.: The accuracy and efficiency of posit arithmetic. In: 2021 IEEE 39th International Conference on Computer Design (ICCD), pp. 83–87. IEEE (2021)
Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M.W., Keutzer, K.: A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630 (2021)
Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper. arXiv preprint arXiv:1806.08342 (2018)
Li, Y., Dong, X., Wang, W.: Additive powers-of-two quantization: an efficient non-uniform discretization for neural networks. arXiv preprint arXiv:1909.13144 (2019)
Cococcioni, M., Rossi, F., Ruffaldi, E., Saponara, S., de Dinechin, B.D.: Novel arithmetics in deep neural networks signal processing for autonomous driving: challenges and opportunities. IEEE Signal Process. Mag. 38(1), 97–110 (2020)
Ho, N.-M., Nguyen, D.-T., De Silva, H., Gustafson, J.L., Wong, W.-F., Chang, I.J.: Posit arithmetic for the training and deployment of generative adversarial networks. In: 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1350–1355. IEEE (2021)
Zhou, Y., Moosavi-Dezfooli, S.-M., Cheung, N.-M., Frossard, P.: Adaptive quantization for deep neural network. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Li, Y., et al.: BRECQ: pushing the limit of post-training quantization by block reconstruction. arXiv preprint arXiv:2102.05426 (2021)
Zafrir, O., Boudoukh, G., Izsak, P., Wasserblat, M.:Q8BERT: quantized 8bit BERT. arXiv preprint arXiv:1910.06188 (2019)
Chen, Y.-H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2016)
Ramanathan, et al.: Look-up table based energy efficient processing in cache support for neural network acceleration. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 88–101. IEEE (2020)
Sun, X., et al.: Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks. Adv. Neural. Inf. Process. Syst. 32, 4900–4909 (2019)
Tukey, J.W., et al.: Exploratory Data Analysis, vol. 2. Reading, Mass. (1977)
Dawson, R.: How significant is a boxplot outlier? J. Stat. Educ. 19(2) (2011)
Micikevicius, P., et al.: Mixed precision training. arXiv preprint arXiv:1710.03740 (2017)
Langroudi, H.F., Karia, V., Gustafson, J.L., Kudithipudi, D.: Adaptive posit: Parameter aware numerical format for deep learning inference on the edge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 726–727 (2020)
Lu, J., Fang, C., Xu, M., Lin, J., Wang, Z.: Evaluations on deep neural networks training using posit number system. IEEE Trans. Comput. 70(2), 174–187 (2020)
Anonymous: Anonymous demo (2021). https://colab.research.google.com/drive/1mT-tBy5gpn8lassGIlYwS9q1cAW9O5ot?usp=sharing
Ho, N.-M., De Silva, H., Gustafson, J.L., Wong, W.-F.: Qtorch+: next Generation Arithmetic for Pytorch Machine Learning. In: Gustafson, J., Dimitrov, V. (eds.) CoNGA 2022. LNCS, vol. 13253, pp. 31–49. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-09779-9_3
Zhang, T., Lin, Z., Yang, G., De Sa, C.: QPyTorch: a low-precision arithmetic simulation framework. arXiv preprint arXiv:1910.04540 (2019)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Wang, X., et al.: Esrgan: Enhanced super-resolution generative adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
Bevilacqua, M., Roumy, A., Guillemot, C., Alberi-Morel, M.L.: Low-Complexity Single-Image Super-Resolution Based on Nonnegative Neighbor Embedding. BMVA Press (2012)
Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparse-representations. In: Boissonnat, J.-D., et al. (eds.) Curves and Surfaces 2010. LNCS, vol. 6920, pp. 711–730. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27413-8_47
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Stephen, M., Caiming, X., James, B., Socher, R.: The wikitext long term dependency language modeling dataset (2016)
Wolf, T., et al.: Hugging face’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)
Krishnamoorthi, R., James, R., Min, N., Chris, G., Seth, W.: Introduction to Quantization on PyTorch (2020). https://pytorch.org/blog/introduction-to-quantization-on-pytorch/
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Bahl, L.R., Jelinek, F., Mercer, R.L.: A maximum likelihood approach to continuous speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2, 179–190 (1983)
Acknowledgements
This research/project is supported in part by the Ministry of Education, Singapore, under the Academic Research Fund Tier 1 (FY2018) and the Next Generation Arithmetic grant from the National Supercomputing Centre, A*STAR, Singapore.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ho, NM., Nguyen, DT., Gustafson, J.L., Wong, WF. (2023). Bedot: Bit Efficient Dot Product for Deep Generative Models. In: Gustafson, J., Leong, S.H., Michalewicz, M. (eds) Next Generation Arithmetic. CoNGA 2023. Lecture Notes in Computer Science, vol 13851. Springer, Cham. https://doi.org/10.1007/978-3-031-32180-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-32180-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32179-5
Online ISBN: 978-3-031-32180-1
eBook Packages: Computer ScienceComputer Science (R0)