Skip to main content

Multi-scale Attention Consistency for Multi-label Image Classification

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1332))

Included in the following conference series:

Abstract

Human has well demonstrated its cognitive consistency over image transformations such as flipping and scaling. In order to learn from human’s visual perception consistency, researchers find out that convolutional neural network’s capacity of discernment can be further elevated via forcing the network to concentrate on certain area in the picture in accordance with the human natural visual perception. Attention heatmap, as a supplementary tool to reveal the essential region that the network chooses to focus on, has been developed and widely adopted by CNNs. Based on this regime of visual consistency, we propose a novel end-to-end trainable CNN architecture with multi-scale attention consistency. Specifically, our model takes an original picture and its flipped counterpart as inputs, and then send them into a single standard Resnet with additional attention-enhanced modules to generate a semantically strong attention heatmap. We also compute the distance between multi-scale attention heatmaps of these two pictures and take it as an additional loss to help the network achieve better performance. Our network shows superiority on the multi-label classification task and attains compelling results on the WIDER Attribute Dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2013)

    Article  Google Scholar 

  2. Cao, Y., Wang, Q.-F., Huang, K., Zhang, R.: Improving image caption performance with linguistic context. In: Ren, J., et al. (eds.) BICS 2019. LNCS (LNAI), vol. 11691, pp. 3–11. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39431-8_1

    Chapter  Google Scholar 

  3. Gao, Z., Liu, D., Huang, K., Huang, Y.: Context-aware human activity and smartphone position-mining with motion sensors. Remote Sens. 11(21), 2531 (2019)

    Article  Google Scholar 

  4. Lavie, N.: Distracted and confused? Selective attention under load. Trends Cogn. Sci. 9(2), 75–82 (2005)

    Article  Google Scholar 

  5. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In Proceedings of the CVPR, pp. 2921–2929 (2016)

    Google Scholar 

  6. Desimone, R., Duncan, J.: Neural mechanisms of selective visual attention. Ann. Rev. Neurosci. 18(1), 193–222 (1995)

    Article  Google Scholar 

  7. Guo, H., Zheng, K., Fan, X., Yu, H., Wang, S.: Visual attention consistency under image transforms for multi-label image classification. In: Proceedings of the CVPR, pp. 729–739 (2019)

    Google Scholar 

  8. Stollenga, M.F., Masci, J., Gomez, F., Schmidhuber, J.: Deep networks with internal selective attention through feedback connections. In: Advances in Neural Information Processing Systems, pp. 3545–3553 (2014)

    Google Scholar 

  9. Li, X., Zhao, F., Guo, Y.: Multi-label image classification with a probabilistic label enhancement model. In: UAI, vol. 1, p. 3 (2014)

    Google Scholar 

  10. Cabral, R., De la Torre, F., Costeira, J.P., Bernardino, A.: Matrix completion for weakly-supervised multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 37(1), 121–135 (2014)

    Article  Google Scholar 

  11. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the CVPR, pp. 7132–7141 (2018)

    Google Scholar 

  12. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the CVPR, pp. 770–778 (2016)

    Google Scholar 

  14. Dembczynski, K., Kotlowski, W., Hüllermeier, E.: Consistent multilabel ranking through univariate losses. arXiv preprint arXiv:1206.6401 (2012)

  15. Li, D., Chen, X., Huang, K.: Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 111–115 (2015)

    Google Scholar 

  16. Park, J., Woo, S., Lee, J.Y., Kweon, I.S.: BAM: Bottleneck attention module. arXiv preprint arXiv:1807.06514 (2018)

  17. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the CVPR, pp. 248–255(2009)

    Google Scholar 

  18. Li, Y., Huang, C., Loy, C.C., Tang, X.: Human attribute recognition by deep hierarchical contexts. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 684–700. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_41

    Chapter  Google Scholar 

  19. Guo, H., Fan, X., Wang, S.: Human attribute recognition by refining attention heat map. Pattern Recogn. Lett. 94, 38–45 (2017)

    Article  Google Scholar 

  20. Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X.: Learning spatial regularization with image-level supervisions for multi-label image classification. In: Proceedings of the CVPR, pp. 5513–5522 (2017)

    Google Scholar 

  21. Sarafianos, N., Xu, X., Kakadiaris, I.A.: Deep imbalanced attribute classification using visual attention aggregation. In: Proceedings of the ECCV, pp. 680–697 (2018)

    Google Scholar 

  22. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

Download references

Acknowledgement

This study was funded by National Natural Science Foundation of China under no. 61876154, 61876155, and U1804159; Natural Science Foundation of Jiangsu Province BK20181189 and BK20181190; Key Program Special Fund in XJTLU under no. KSF-A-10, KSF-A-01, KSF-P-02, KSF-E-26 and KSF-T-06; and XJTLU Research Development Fund RDF-16-02-49 and RDF-16-01-57.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaizhu Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, H., Jin, X., Wang, Q., Huang, K. (2020). Multi-scale Attention Consistency for Multi-label Image Classification. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_93

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63820-7_93

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63819-1

  • Online ISBN: 978-3-030-63820-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics