Skip to main content

Explore Hierarchical Relations Reasoning and Global Information Aggregation

  • Conference paper
  • First Online:
Book cover Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Abstract

Existing Graph Convolution Networks (GCNs) mainly explore the depth structure with focusing on aggregating information from the local 1-hop neighbor, leading to the common issues of over-smoothing and gradient vanishing, and further hurting the model performance on downstream tasks. To alleviate these deficiencies, we tentatively explore the width of the graph structure and propose Wider Graph Convolution Network (WGCN), targeting to reason the hierarchical relations and aggregate the global information from multi-hops dilated neighbor in a wide but shallow framework. Meanwhile, Dynamic Graph Convolution (DGC) is adopted via the masking and attention mechanisms to distinguish and re-weight informative neighborhood nodes, for stabilizing the model optimization. We demonstrate the effectiveness of WGCN on three popular tasks, including Document classification, Scene text detection and Point cloud segmentation. On the basis of achieving state-of-the-art performance among presented tasks, we verify that exploration through width expansion of GCNs is more effective than stacking more layers for model improvements.

Supported by NSFC project Grant No. U1833101, SZSTI under Grant No. JCYJ20190809172201639, the Joint Research Center of Tencent and Tsinghua, and National Key R&D Program of China (No. 2020AAA0108303).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)

    Google Scholar 

  2. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: NeurIPS, pp. 3844–3852 (2016)

    Google Scholar 

  3. Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  4. Gao, H., Wang, Z., Ji, S.: Large-scale learnable graph convolutional networks. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1416–1424 (2018)

    Google Scholar 

  5. Gong, L., Cheng, Q.: Exploiting edge features for graph neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9211–9219 (2019)

    Google Scholar 

  6. Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NeurIPS, pp. 1024–1034 (2017)

    Google Scholar 

  7. Han, L., Zheng, T., Xu, L., Fang, L.: Occuseg: occupancy-aware 3d instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2940–2949 (2020),

    Google Scholar 

  8. Jiang, L., Zhao, H., Shi, S., Liu, S., Fu, C.W., Jia, J.: Pointgroup: dual-set point grouping for 3d instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4867–4876 (2020)

    Google Scholar 

  9. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)

  10. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  11. Landrieu, L., Boussaha, M.: Point cloud oversegmentation with graph-structured deep metric learning. arXiv preprint arXiv:1904.02113 (2019)

  12. Li, G., Muller, M., Thabet, A., Ghanem, B.: Deepgcns: can gcns go as deep as cnns? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9267–9276 (2019)

    Google Scholar 

  13. Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: Pointcnn: convolution on x-transformed points. In: NeurIPS, pp. 820–830 (2018)

    Google Scholar 

  14. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp. 2117–2125 (2017)

    Google Scholar 

  15. Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)

  16. Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 20–36 (2018)

    Google Scholar 

  17. Nathani, D., Chauhan, J., Sharma, C., Kaul, M.: Learning attention-based embeddings for relation prediction in knowledge graphs. arXiv:1906.01195 (2019)

  18. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

    Google Scholar 

  19. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)

    Google Scholar 

  20. Simonovsky, M., Komodakis, N.: Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3693–3702 (2017)

    Google Scholar 

  21. Tang, J., Qu, M., Mei, Q.: Pte: predictive text embedding through large-scale heterogeneous text networks. In: KDD, pp. 1165–1174. ACM (2015)

    Google Scholar 

  22. Wang, G., et al.: Joint embedding of words and labels for text classification. arXiv preprint arXiv:1805.04174 (2018)

  23. Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)

    Google Scholar 

  24. Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8440–8449 (2019)

    Google Scholar 

  25. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph cnn for learning on point clouds. ACM Trans. Graphics (TOG) 38(5), 1–12 (2019)

    Google Scholar 

  26. Wang, Y., Xie, H., Zha, Z.J., Xing, M., Fu, Z., Zhang, Y.: Contournet: taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11753–11762 (2020)

    Google Scholar 

  27. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. (2020)

    Google Scholar 

  28. Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 7370–7377 (2019)

    Google Scholar 

  29. Ye, X., Li, J., Huang, H., Du, L., Zhang, X.: 3D recurrent neural networks with context fusion for point cloud semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 403–417 (2018)

    Google Scholar 

  30. Zhang, S.X., et al.: Deep relational reasoning graph network for arbitrary shape text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9699–9708 (2020)

    Google Scholar 

  31. Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun Yuan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, L., Yuan, C., Fan, K. (2021). Explore Hierarchical Relations Reasoning and Global Information Aggregation. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12821. Springer, Cham. https://doi.org/10.1007/978-3-030-86549-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86549-8_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86548-1

  • Online ISBN: 978-3-030-86549-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics