An efficient segmented quantization for graph neural networks

Dai, Yue; Tang, Xulong; Zhang, Youtao

doi:10.1007/s42514-022-00121-z

An efficient segmented quantization for graph neural networks

Regular Paper
Published: 06 November 2022

Volume 4, pages 461–473, (2022)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

214 Accesses
1 Citation
Explore all metrics

Abstract

Graph Neural Networks (GNNs) are recently developed machine learning approaches that exploit the advances in Neural Networks for a wide range of graph applications. While GNNs achieve promising inference accuracy improvements over conventional approaches, their efficiency suffers from expensive computation and intensive memory access in feature aggregation and combination phases, leading to large inference latency. Recent studies proposed mixed-precision feature quantization to address the memory access overhead. However, its linear approximation and computation complexity become the main constraints for the overall GNN accuracy and performance. In this paper, we propose segmented quantization to partition the feature range into segments and customize linear approximation within each segment based on original value density, and conduct efficient mixed-precision computing between quantized feature and full precision weights. Segmented quantization helps to achieve high inference accuracy while maintaining low computation complexity. We also devise the hardware accelerator to fully explore the benefits of segmented quantization. Our experiments show that up to 5% average accuracy and up to 6.8\(\times\) performance improvements can be achieved over the state-of-the-art GNN accelerators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A unified framework for backpropagation-free soft and hard gated graph neural networks

Article Open access 26 December 2023

Research on FPGA Accelerator Optimization Based on Graph Neural Network

Scalable decoupling graph neural network with feature-oriented optimization

Article 27 December 2023

References

Auten, A., Tomei, M., Kumar, R.: Hardware acceleration of graph neural networks. In: 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1–6 (2020)
Chen, X., Wang, Y., Xie, X., Hu, X., Basak, A., Liang, L., et al.: Rubik: a hierarchical architecture for efficient graph neural network training. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021)
Dong, Z., Yao, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: HAWQ: Hessian aware quantization of neural networks with mixed precision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 293–302 (2019)
Fang, J., Shafiee, A., Abdel-Aziz, H., Thorsley, D., Georgiadis, G., Hassoun, J.H.: Posttraining piecewise linear quantization for deep neural networks. In: European Conference on Computer Vision, pp. 69–86 (2020)
Feng, B., Wang, Y., Li, X., Yang, S., Peng, X., Ding, Y.: SGQUANT: Squeezing the last bit on graph neural networks with specialized quantization. In: 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1044–1052 (2020)
Geng, T., Li, A., Shi, R., Wu, C., Wang, T., Li, Y., et al.: AWB-GCN: a graph convolutional network accelerator with runtime workload rebalancing. In: 2020 53rd annual IEEE/ACM International Symposium on Microarchitecture (micro), pp. 922–936 (2020)
Geng, T., Wu, C., Zhang, Y., Tan, C., Xie, C., You, H., et al.: IGCN: a graph convolutional network accelerator with runtime locality enhancement through islandization. In: Micro-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 1051–1063 (2021)
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Adv. Neural Inf. Proc. Syst. 30 (2017)
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)
Hubara, I., Courbariaux, M., Soudry, D., ElYaniv, R., Bengio, Y.: Binarized neural networks. Adv. Neural Inf. Process. Syst. 29 (2016)
Imani, M., Razlighi, M.S., Kim, Y., Gupta, S., Koushanfar, F., Rosing, T.: Deep learning acceleration with neuron-to-memory transformation. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 1–14 (2020)
Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2704–2713 (2018)
Kiningham, K., Re, C., Levis, P.: Grip: a graph neural network accelerator architecture. arXiv preprint arXiv:2007.13828 (2020)
Kipf, T.N., & Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Li, J., Louri, A., Karanth, A., Bunescu, R.: Gcnax: a flexible and energy-efficient accelerator for graph convolutional neural networks. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 775–788 (2021)
Liang, S., Wang, Y., Liu, C., He, L., Huawei, L., Xu, D., Li, X.: ENGN: a highthroughput and energy-efficient accelerator for large graph neural networks. IEEE Trans. Comput. 70(9), 1511–1525 (2020)
Article MATH Google Scholar
Liu, Z., Wang, Y., Han, K., Zhang, W., Ma, S., Gao, W.: Post-training quantization for vision transformer. Adv. Neural Inf. Process. Syst. 34, 28092–28103 (2021)
Google Scholar
Long, Y., Lee, E., Kim, D., Mukhopadhyay, S.: Q-PIM: a genetic algorithm based flexible DNN quantization method and application to processing-in-memory platform. In: 2020 57th ACM/IEEE design automation conference (DAC), pp. 1–6 (2020)
Marchisio, A., Bussolino, B., Colucci, A., Martina, M., Masera, G., Shafique, M.: QCAPSNETS: a specialized framework for quantizing capsule networks. In: 2020 57th ACM/IEEE design automation conference (DAC), pp. 1–6 (2020)
Miyashita, D., Lee, E.H., Murmann, B.: Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025 (2016)
Park, E., Yoo, S., Vajda, P.: Value-aware quantization for training and inference of neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 580–595 (2018)
Qu, S., Li, B., Wang, Y., Xu, D., Zhao, X., Zhang, L.: RAQU: an automatic high utilization CNN quantization and mapping framework for general-purpose RRAM accelerator. In: 2020 57th ACM/IEEE design automation conference (DAC), pp. 1–6 (2020)
Tailor, S.A., Fernandez-Marques, J., Lane, N.D.: Degree-quant: quantization-aware training for graph neural networks. arXiv preprint arXiv:2008.05000 (2020)
Thekumparampil, K.K., Wang, C., Oh, S., Li, L.- J.: Attention-based graph neural network for semi-supervised learning. arXiv preprint arXiv:1803.03735 (2018)
Thoziyoor, S., Ahn, J.H., Monchiero, M., Brockman, J.B., Jouppi, N.P.: A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. ACM SIGARCH Comput. Arch. News 36(3), 51–62 (2008)
Article Google Scholar
Veličcković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: Haq: Hardware-aware automated quantization with mixed precision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8612–8620 (2019)
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018)
Yan, M., Deng, L., Hu, X., Liang, L., Feng, Y., Ye, X., et al.: HYGCN: a GCN accelerator with hybrid architecture. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 15–29 (2020)
Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: Towards lossless CNNS with low-precision weights. arXiv preprint arXiv:1702.03044 (2017)
Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)

Download references

Author information

Xulong Tang and Youtao Zhang contributed equally to this work.

Authors and Affiliations

Department of Computer Science, University of Pittsburgh, Pittsburgh, 15213, PA, USA
Yue Dai, Xulong Tang & Youtao Zhang

Authors

Yue Dai
View author publications
You can also search for this author in PubMed Google Scholar
Xulong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Youtao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Dai.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dai, Y., Tang, X. & Zhang, Y. An efficient segmented quantization for graph neural networks. CCF Trans. HPC 4, 461–473 (2022). https://doi.org/10.1007/s42514-022-00121-z

Download citation

Received: 28 February 2022
Accepted: 22 August 2022
Published: 06 November 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s42514-022-00121-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient segmented quantization for graph neural networks

Abstract

Access this article

Similar content being viewed by others

A unified framework for backpropagation-free soft and hard gated graph neural networks

Research on FPGA Accelerator Optimization Based on Graph Neural Network

Scalable decoupling graph neural network with feature-oriented optimization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient segmented quantization for graph neural networks

Abstract

Access this article

Similar content being viewed by others

A unified framework for backpropagation-free soft and hard gated graph neural networks

Research on FPGA Accelerator Optimization Based on Graph Neural Network

Scalable decoupling graph neural network with feature-oriented optimization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation