Abstract
In the past few years, Convolution Neural Network (CNN) has been successfully applied to many computer vision tasks. Most of these networks can only extract first-order information from input images. The second-order statistical information refers to the second-order correlation obtained by calculating the covariance matrix, the fisher information matrix, or the vector outer product operation on the local feature group according to the channels. It has been shown that using second-order information on facial expression datasets can better capture the distortion of facial area features, while at the same time generate more parameters which may cause much more computational cost. In this article we propose a new CNN structure including layers which can (i) incorporate first-order information into the covariance matrix; (ii) use eigenvalue vectors to measure the importance of feature channels; (iii) reduce the bilinear dimensionality of the parameter matrix; and (iv) perform Cholesky decomposition on the positive definite matrix to complete the compression of the second-order information matrix. Due to the incorporation of both first-order and second-order information and the Cholesky compression strategy, our proposed method reduces the number of parameters by half of the SPDNet model, and simultaneously achieves better results in facial expression classification tasks than the corresponding first-order model and the reference second-order model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
He, K., Zhang, X., Ren, S, Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In ICLR, pp. 340–352 (2015)
Lin, T., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV, pp. 1449–1457 (2015)
Ionescu, C.,Vantzos, O., Sminchisescu, C.: Matrix backpropagation for deep networks with structured layers. In: IEEE International Conference on Computer Vision, pp. 990–1002 (2015)
Li, P., Xie, J., Wang, Q., Zuo, W.: Is second-order information helpful for large-scale visual recognition? In: ICCV, pp. 1205–1213 (2017)
Tuzel, O., Porikli, F., Meer, P.: Region Covariance: A Fast Descriptor for Detection and Classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 589–600. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_45
Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: International Conference on Computer Vision and Pattern Recognition, pp. 317–326 (2016)
Kong, S., Fowlkes, C.: Low-rank bilinear pooling for fine-grained classification. In: Conference on Computer Vision and Pattern Recognition, pp. 880–890 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25(2) (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR, pp. 553–572 (2015)
Duchi, J., Hazan, E., Singer, Y.: Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 12(7), 257–269 (2011)
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. Computer Science, pp. 1135–1142 (2015)
Srivastava, N., Hinton, G., Krizhevsky, A., Salakhutdinov, R.: Dropout - a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Zeiler, M.: ADADELTA: An Adaptive Learning Rate Method. arXiv.org (2012)
Tuzel, O., Porikli, F., Meer, P.: Pedestrian detection via classification on riemannian manifolds. In: IEEE TPAMI, pp. 1980–1991 (2008)
Pennec. X., Fillard, P., Ayache, N.: A riemannian framework for tensor computing. In: IJCV, pp. 990–1112 (2006)
Ha, M., San-Biagio, M., Murino, V.: Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces. In NIPS, pp. 1124–1134 (2014)
Sra, S.: A new metric on the manifold of kernel matrices with application to matrix geometric means. In: NIPS, pp. 2010–2023 (2012)
Xu, X., Mu, N., Zhang, X.: Covariance descriptor based convolution neural network for saliency computation in low contrast images. In: International Joint Conference on Neural Networks, pp. 1220–1229 (2016)
Yu, K., Salzmann, M.: Second-order convolutional neural networks. In: Computer Vision and Pattern Recognition, pp. 1305–1316 (2017)
Yu, K., Salzmann, M.: Statistically-Motivated Second-Order Pooling. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 621–637. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_37
Huang, Z., Van Gool, L.: A Riemannian network for SPD matrix learning. In: Internaltional Conference on Computer Vision and Pattern Recognition, pp. 2036–2042 ( 2017)
Acharya, D., Huang, Z., Paudel, D.: Covariance pooling for facial expression recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 480–487 (2018)
Dai, T., Cai, J., Zhang, Y., Xia, S., Zhang, L.: Second-order attention network for single image super-resolution. In: International Conference on Computer Vision and Pattern Recogintion, pp. 1123–1135 (2019)
Dhall, A., et al.: Collecting large, richly annotated facial expression databases from movies. IEEE Multimedia 19(3), 34–41 (2012)
Dhall, A., et al.: Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In: ACM ICMI (2014)
Li, S., Deng, W, Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 89–96 (2017)
Zhu, X., Ramanan, D.: Face detection, pose estimation and landmark estimation in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3445 (2012)
Benitez-Quiroz, C.F., Srinivasan, R., Martinez, A.M.: Emotionet: an accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5562–5570 (2016)
Goodfellow, I.J.: Challenges in representation learning. Neural Netw. 64(C), 59–63 (2015)
Acknowledgements
This work is supported by NSF of Hebei Province (No. F2018201096), NSF of Guangdong Province (No. 2018A0303130026), the Key Science and Technology Foundation of the Educational Department of Hebei Province (ZD 2019021), and NSFC (No. 61976141).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Y., Zhang, J., Hua, Q. (2021). Second-Order Convolutional Neural Network Based on Cholesky Compression Strategy. In: Zhang, Y., Xu, Y., Tian, H. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2020. Lecture Notes in Computer Science(), vol 12606. Springer, Cham. https://doi.org/10.1007/978-3-030-69244-5_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-69244-5_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69243-8
Online ISBN: 978-3-030-69244-5
eBook Packages: Computer ScienceComputer Science (R0)