Toward Faster and Simpler Matrix Normalization via Rank-1 Update

Yu, Tan; Cai, Yunfeng; Li, Ping

doi:10.1007/978-3-030-58529-7_13

Toward Faster and Simpler Matrix Normalization via Rank-1 Update

Tan Yu^12,13,
Yunfeng Cai^12,13 &
Ping Li^12,13

Conference paper
First Online: 13 November 2020

3087 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12364))

Abstract

Bilinear pooling has been used in many computer vision tasks and recent studies discover that matrix normalization is a vital step for achieving impressive performance of bilinear pooling. The standard matrix normalization, however, needs singular value decomposition (SVD), which is not well suited in the GPU platform, limiting its efficiency in training and inference. To resolve this issue, the Newton-Schulz (NS) iteration method has been proposed to approximate the matrix square-root. Although it is GPU-friendly, the NS iteration still takes several (expensive) iterations of matrix-matrix multiplications. Furthermore, the NS iteration is incompatible with the compact bilinear features obtained from Tensor Sketch (TS) or Random Maclaurin (RM). To overcome those known limitations, in this paper we propose a “rank-1 update normalization” (RUN), which only needs matrix-vector multiplications and is hence substantially more efficient than the NS iteration using matrix-matrix multiplications. Moreover, RUN readily supports the normalization on compact bilinear features from TS or RM. Besides, RUN is simpler than the NS iteration and easier for implementation in practice. As RUN is a differentiable procedure, we can plug it in a CNN-based an end-to-end training setting. Extensive experiments on four public benchmarks demonstrates that, for the full bilinear pooling, RUN achieves comparable accuracy with a substantial speedup over the NS iteration. For the compact bilinear pooling, RUN achieves comparable accuracy with a significant speedup over SVD-based normalization.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric means in a novel vector space structure on symmetric positive-definite matrices. Siam J. Matrix Anal. Appl. 29(1), 328–347 (2006)
Article MathSciNet Google Scholar
Burden, R.L., Faires, J.D.: Numerical Analysis, 4th edn. (1988)
Google Scholar
Cherian, A., Koniusz, P., Gould, S.: Higher-order pooling of CNN features via kernel linearization for action recognition. In: Applications of Computer Vision (2017)
Google Scholar
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: CVPR (2014)
Google Scholar
Cui, Y., Zhou, F., Wang, J., Liu, X., Lin, Y., Belongie, S.: Kernel pooling for convolutional neural networks. In: CVPR. IEEE (2017)
Google Scholar
Engin, M., Wang, L., Zhou, L., Liu, X.: DeepKSPD: learning kernel-matrix-based SPD representation for fine-grained image recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 629–645. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_38
Chapter Google Scholar
Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. In: EMNLP (2016)
Google Scholar
Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: CVPR. IEEE (2016)
Google Scholar
Gou, M., Xiong, F., Camps, O., Sznaier, M.: MoNet: moments embedding network. In: CVPR. IEEE (2018)
Google Scholar
Higham, N.J.: Functions of Matrices: Theory and Computation, vol. 104. Siam (2008)
Google Scholar
Ionescu, C., Vantzos, O., Sminchisescu, C.: Matrix backpropagation for deep networks with structured layers. In: ICCV. IEEE (2015)
Google Scholar
Kar, P., Karnick, H.: Random feature maps for dot product kernels. In: AISTATS (2012)
Google Scholar
Kong, S., Fowlkes, C.: Low-rank bilinear pooling for fine-grained classification. In: CVPR, pp. 365–374. IEEE (2017)
Google Scholar
Koniusz, P., Cherian, A., Porikli, F.: Tensor representations via kernel linearization for action recognition from 3D skeletons. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 37–53. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_3
Chapter Google Scholar
Koniusz, P., Yan, F., Gosselin, P.H., Mikolajczyk, K.: Higher-order occurrence pooling for bags-of-words: visual concept detection. T-PAMI 39(2), 313–326 (2017)
Article Google Scholar
Koniusz, P., Zhang, H., Porikli, F.: A deeper look at power normalizations. In: CVPR. IEEE (2018)
Google Scholar
Lei, W., Zhang, J., Zhou, L., Chang, T., Li, W.: Beyond covariance: feature representation with nonlinear kernel matrices. In: ICCV. IEEE (2015)
Google Scholar
Li, P., Xie, J., Wang, Q., Gao, Z.: Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In: CVPR. IEEE (2018)
Google Scholar
Li, P., Xie, J., Wang, Q., Zuo, W.: Is second-order information helpful for large-scale visual recognition? In: ICCV. IEEE (2017)
Google Scholar
Lin, T.Y., Maji, S.: Improved bilinear pooling with CNNs. In: BMVC (2017)
Google Scholar
Lin, T.-Y., Maji, S., Koniusz, P.: Second-order democratic aggregation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 639–656. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_38
Chapter Google Scholar
Lin, T.Y., Roychowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV. IEEE (2015)
Google Scholar
Maji, S., Kannala, J., Rahtu, E., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. Technical report (2013)
Google Scholar
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)
Google Scholar
Pham, N., Pagh, R.: Fast and scalable polynomial kernels via explicit feature maps. In: SIGKDD, pp. 239–247. ACM (2013)
Google Scholar
Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR. IEEE (2009)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tu, Z., et al.: Multi-stream CNN: learning representations based on human-related regions for action recognition. PR 79, 32–43 (2018)
Google Scholar
Wang, Q., Li, P., Zhang, L.: G2DeNet: global gaussian distribution embedding network and its application to visual recognition. In: CVPR. IEEE (2017)
Google Scholar
Wang, Q., Li, P., Zuo, W., Lei, Z.: Raid-g: Robust estimation of approximate infinite dimensional gaussian with application to material recognition. In: CVPR. IEEE (2016)
Google Scholar
Wang, Y., Long, M., Wang, J., Yu, P.S.: Spatiotemporal pyramid network for video action recognition. In: CVPR. IEEE (2017)
Google Scholar
Wei, X., Zhang, Y., Gong, Y., Zhang, J., Zheng, N.: Grassmann pooling as compact homogeneous bilinear pooling for fine-grained visual classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 365–380. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_22
Chapter Google Scholar
Welinder, P., et al.: Caltech-UCSD birds 200 (2010)
Google Scholar
Yu, C., Zhao, X., Zheng, Q., Zhang, P., You, X.: Hierarchical bilinear pooling for fine-grained visual recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 595–610. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_35
Chapter Google Scholar
Yu, T., Meng, J., Yuan, J.: Multi-view harmonized bilinear network for 3D object recognition. In: CVPR. IEEE (2018)
Google Scholar
Yu, Z., Yu, J., Fan, J., Tao, D.: Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: ICCV. IEEE (2017)
Google Scholar
Zheng, H., Fu, J., Zha, Z.J., Luo, J.: Learning deep bilinear transformation for fine-grained image representation. In: Advances in Neural Information Processing Systems, pp. 4277–4286. Curran Associates, Inc. (2019)
Google Scholar
Zhou, L., Lei, W., Zhang, J., Shi, Y., Yang, G.: Revisiting metric learning for SPD matrix based visual representation. In: CVPR. IEEE (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Cognitive Computing Lab, Baidu Research, 10900 NE 8th Street, Bellevue, WA, 98004, USA
Tan Yu, Yunfeng Cai & Ping Li
Cognitive Computing Lab, Baidu Research, No. 10 Xibeiwang East Road, Beijing, 100085, China
Tan Yu, Yunfeng Cai & Ping Li

Authors

Tan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yunfeng Cai
View author publications
You can also search for this author in PubMed Google Scholar
Ping Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tan Yu .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 171 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, T., Cai, Y., Li, P. (2020). Toward Faster and Simpler Matrix Normalization via Rank-1 Update. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12364. Springer, Cham. https://doi.org/10.1007/978-3-030-58529-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-58529-7_13
Published: 13 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58528-0
Online ISBN: 978-3-030-58529-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics