Abstract
Attention mechanisms can effectively improve the performance of the mobile networks with a limited computational complexity cost. However, existing attention methods extract importance from only one domain of the networks, hindering further performance improvement. In this paper, we propose the Attention in Attention (AIA) mechanism integrating One Dimension Frequency Channel Attention (1D FCA) with Joint Coordinate Attention (JCA) to collaboratively adjust the channel and coordinate weights in frequency and spatial domains, respectively. Specifically, 1D FCA using 1D Discrete Cosine Transform (DCT) adaptively extract and enhance the necessary channel information in the frequency domain. The JCA using explicit and implicit coordinate information extract and embed position feature into frequency channel attention. Extensive experiments on different datasets demonstrate that the proposed AIA mechanism can effectively improve the accuracy with only a limited computation complexity cost.
This work was supported in part by Guangdong Shenzhen Joint Youth Fund under Grant 2021A151511074, in part by the NSFC fund 62176077, in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2019Bl515120055, in part by the Shenzhen Key Technical Project under Grant 2020N046, in part by the Shenzhen Fundamental Research Fund under Grant JCYJ20210324132210025, and in part by the Medical Biometrics Perception and Analysis Engineering Laboratory, Shenzhen, China. Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies (2022B1212010005).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Cao, J., Liu, B., Wen, Y., Xie, R., Song, L.: Personalized and invertible face de-identification by disentangled identity information manipulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3334–3342 (2021)
Chen, H., et al.: Diverse image style transfer via invertible cross-space mapping. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14860–14869. IEEE Computer Society (2021)
Darlow, L.N., Crowley, E.J., Antoniou, A., Storkey, A.J.: Cinic-10 is not imagenet or CIFAR-10. arXiv preprint arXiv:1810.03505 (2018)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Fu, J., et al.: Dual attention network for scene segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Guo, L., Liu, J., Zhu, X., Yao, P., Lu, S., Lu, H.: Normalized and geometry-aware self-attention network for image captioning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 10327–10336 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Jiang, X., et al.: Attention scaling for crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4706–4715 (2020)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Li, Y., Gao, Y., Chen, B., Zhang, Z., Lu, G., Zhang, D.: Self-supervised exclusive-inclusive interactive learning for multi-label facial expression recognition in the wild. IEEE Trans. Circuits Syst. Video Technol. 32(5) (2022)
Li, Y., Lu, G., Li, J., Zhang, Z., Zhang, D.: Facial expression recognition in the wild using multi-level features and attention mechanisms. IEEE Trans. Affect. Comput. (2020). https://doi.org/10.1109/TAFFC.2020.3031602
Li, Y., Zhang, Z., Chen, B., Lu, G., Zhang, D.: Deep margin-sensitive representation learning for cross-domain facial expression recognition. IEEE Trans. Multimedia (2022). https://doi.org/10.1109/TMM.2022.3141604
Lu, C., Peng, X., Wei, Y.: Low-rank tensor completion with a new tensor nuclear norm induced by invertible linear transforms. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5996–6004 (2019)
Lu, Y., Lu, G., Li, J., Xu, Y., Zhang, D.: High-parameter-efficiency convolutional neural networks. Neural Comput. Appl. 32(14), 10633–10644 (2019). https://doi.org/10.1007/s00521-019-04596-w
Lu, Y., Lu, G., Li, J., Zhang, Z., Xu, Y.: Fully shared convolutional neural networks. Neural Comput. Appl. 33(14), 8635–8648 (2021). https://doi.org/10.1007/s00521-020-05618-8
Lu, Y., Lu, G., Lin, R., Li, J., Zhang, D.: SRGC-nets: Sparse repeated group convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 31(8), 2889–2902 (2019)
Lu, Y., Lu, G., Xu, Y., Zhang, B.: AAR-CNNs: auto adaptive regularized convolutional neural networks. In: International Joint Conference on Artificial Intelligence, pp. 2511–2517 (2018)
Lu, Y., Lu, G., Zhang, B., Xu, Y., Li, J.: Super sparse convolutional neural networks. In: AAAI Conference on Artificial Intelligence, vol. 33, pp. 4440–4447 (2019)
Lu, Y., Lu, G., Zhou, Y., Li, J., Xu, Y., Zhang, D.: Highly shared convolutional neural networks. Expert Syst. Appl. 175, 114782 (2021)
Park, J., Woo, S., Lee, J.Y., Kweon, I.S.: Bam: bottleneck attention module. arXiv preprint arXiv:1807.06514 (2018)
Qin, Z., Zhang, P., Wu, F., Li, X.: FCANet: frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 783–792 (2021)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Torralba, A., Fergus, R., Freeman, W.T.: 80 Million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Xiong, Y., et al.: MobileDets: searching for object detection architectures for mobile accelerators. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3825–3834 (2021)
Zhou, D., Hou, Q., Chen, Y., Feng, J., Yan, S.: Rethinking bottleneck structure for efficient mobile network design. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 680–697. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_40
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, L., Feng, Q., Lu, Y., Liu, C., Lu, G. (2022). AIA: Attention in Attention Within Collaborate Domains. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13534. Springer, Cham. https://doi.org/10.1007/978-3-031-18907-4_47
Download citation
DOI: https://doi.org/10.1007/978-3-031-18907-4_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18906-7
Online ISBN: 978-3-031-18907-4
eBook Packages: Computer ScienceComputer Science (R0)