Skip to main content
Log in

An efficient segmented quantization for graph neural networks

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Graph Neural Networks (GNNs) are recently developed machine learning approaches that exploit the advances in Neural Networks for a wide range of graph applications. While GNNs achieve promising inference accuracy improvements over conventional approaches, their efficiency suffers from expensive computation and intensive memory access in feature aggregation and combination phases, leading to large inference latency. Recent studies proposed mixed-precision feature quantization to address the memory access overhead. However, its linear approximation and computation complexity become the main constraints for the overall GNN accuracy and performance. In this paper, we propose segmented quantization to partition the feature range into segments and customize linear approximation within each segment based on original value density, and conduct efficient mixed-precision computing between quantized feature and full precision weights. Segmented quantization helps to achieve high inference accuracy while maintaining low computation complexity. We also devise the hardware accelerator to fully explore the benefits of segmented quantization. Our experiments show that up to 5% average accuracy and up to 6.8\(\times\) performance improvements can be achieved over the state-of-the-art GNN accelerators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Auten, A., Tomei, M., Kumar, R.: Hardware acceleration of graph neural networks. In: 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1–6 (2020)

  • Chen, X., Wang, Y., Xie, X., Hu, X., Basak, A., Liang, L., et al.: Rubik: a hierarchical architecture for efficient graph neural network training. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021)

  • Dong, Z., Yao, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: HAWQ: Hessian aware quantization of neural networks with mixed precision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 293–302 (2019)

  • Fang, J., Shafiee, A., Abdel-Aziz, H., Thorsley, D., Georgiadis, G., Hassoun, J.H.: Posttraining piecewise linear quantization for deep neural networks. In: European Conference on Computer Vision, pp. 69–86 (2020)

  • Feng, B., Wang, Y., Li, X., Yang, S., Peng, X., Ding, Y.: SGQUANT: Squeezing the last bit on graph neural networks with specialized quantization. In: 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1044–1052 (2020)

  • Geng, T., Li, A., Shi, R., Wu, C., Wang, T., Li, Y., et al.: AWB-GCN: a graph convolutional network accelerator with runtime workload rebalancing. In: 2020 53rd annual IEEE/ACM International Symposium on Microarchitecture (micro), pp. 922–936 (2020)

  • Geng, T., Wu, C., Zhang, Y., Tan, C., Xie, C., You, H., et al.: IGCN: a graph convolutional network accelerator with runtime locality enhancement through islandization. In: Micro-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 1051–1063 (2021)

  • Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. Adv. Neural Inf. Proc. Syst. 30 (2017)

  • Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)

  • Hubara, I., Courbariaux, M., Soudry, D., ElYaniv, R., Bengio, Y.: Binarized neural networks. Adv. Neural Inf. Process. Syst. 29 (2016)

  • Imani, M., Razlighi, M.S., Kim, Y., Gupta, S., Koushanfar, F., Rosing, T.: Deep learning acceleration with neuron-to-memory transformation. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 1–14 (2020)

  • Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2704–2713 (2018)

  • Kiningham, K., Re, C., Levis, P.: Grip: a graph neural network accelerator architecture. arXiv preprint arXiv:2007.13828 (2020)

  • Kipf, T.N., & Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  • Li, J., Louri, A., Karanth, A., Bunescu, R.: Gcnax: a flexible and energy-efficient accelerator for graph convolutional neural networks. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 775–788 (2021)

  • Liang, S., Wang, Y., Liu, C., He, L., Huawei, L., Xu, D., Li, X.: ENGN: a highthroughput and energy-efficient accelerator for large graph neural networks. IEEE Trans. Comput. 70(9), 1511–1525 (2020)

    Article  MATH  Google Scholar 

  • Liu, Z., Wang, Y., Han, K., Zhang, W., Ma, S., Gao, W.: Post-training quantization for vision transformer. Adv. Neural Inf. Process. Syst. 34, 28092–28103 (2021)

    Google Scholar 

  • Long, Y., Lee, E., Kim, D., Mukhopadhyay, S.: Q-PIM: a genetic algorithm based flexible DNN quantization method and application to processing-in-memory platform. In: 2020 57th ACM/IEEE design automation conference (DAC), pp. 1–6 (2020)

  • Marchisio, A., Bussolino, B., Colucci, A., Martina, M., Masera, G., Shafique, M.: QCAPSNETS: a specialized framework for quantizing capsule networks. In: 2020 57th ACM/IEEE design automation conference (DAC), pp. 1–6 (2020)

  • Miyashita, D., Lee, E.H., Murmann, B.: Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025 (2016)

  • Park, E., Yoo, S., Vajda, P.: Value-aware quantization for training and inference of neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 580–595 (2018)

  • Qu, S., Li, B., Wang, Y., Xu, D., Zhao, X., Zhang, L.: RAQU: an automatic high utilization CNN quantization and mapping framework for general-purpose RRAM accelerator. In: 2020 57th ACM/IEEE design automation conference (DAC), pp. 1–6 (2020)

  • Tailor, S.A., Fernandez-Marques, J., Lane, N.D.: Degree-quant: quantization-aware training for graph neural networks. arXiv preprint arXiv:2008.05000 (2020)

  • Thekumparampil, K.K., Wang, C., Oh, S., Li, L.- J.: Attention-based graph neural network for semi-supervised learning. arXiv preprint arXiv:1803.03735 (2018)

  • Thoziyoor, S., Ahn, J.H., Monchiero, M., Brockman, J.B., Jouppi, N.P.: A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. ACM SIGARCH Comput. Arch. News 36(3), 51–62 (2008)

    Article  Google Scholar 

  • Veličcković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

  • Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: Haq: Hardware-aware automated quantization with mixed precision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8612–8620 (2019)

  • Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018)

  • Yan, M., Deng, L., Hu, X., Liang, L., Feng, Y., Ye, X., et al.: HYGCN: a GCN accelerator with hybrid architecture. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 15–29 (2020)

  • Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: Towards lossless CNNS with low-precision weights. arXiv preprint arXiv:1702.03044 (2017)

  • Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Dai.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dai, Y., Tang, X. & Zhang, Y. An efficient segmented quantization for graph neural networks. CCF Trans. HPC 4, 461–473 (2022). https://doi.org/10.1007/s42514-022-00121-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-022-00121-z

Keywords

Navigation