Improving loss function for deep convolutional neural network applied in automatic image annotation

Salar, Ali; Ahmadi, Ali

doi:10.1007/s00371-023-02873-3

Improving loss function for deep convolutional neural network applied in automatic image annotation

Original article
Published: 12 May 2023

Volume 40, pages 1617–1629, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

247 Accesses
1 Altmetric
Explore all metrics

Abstract

Automatic image annotation (AIA) is a mechanism for describing the visual content of an image with a list of semantic labels. Typically, there is a massive imbalance between positive and negative tags in a picture—in other words, an image includes much fewer positive labels than negative ones. This imbalance can negatively affect the optimization process and diminish the emphasis on gradients from positive labels during training. Although traditional annotation models mainly focus on model structure design, we propose a novel unsymmetrical loss function for a deep convolutional neural network (CNN) that performs differently on positives and negatives, which leads to a reduction in the loss contribution from negative labels and also highlights the contribution of positive ones. During the annotation process, we specify a threshold for each label separately based on the Matthews correlation coefficient (MCC). Extensive experiments on high-vocabulary datasets like Corel 5k, IAPR TC-12, and Esp Game reveal that despite ignoring the semantic relationships between labels, our suggested approach achieves remarkable results compared to the state-of-the-art automatic image annotation models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The image annotation algorithm using convolutional features from intermediate layer of deep learning

Article 28 September 2020

Automatic image annotation: the quirks and what works

Article 14 June 2018

A survey on automatic image annotation

Article 09 June 2020

Data availability

The data that support the findings of this study are openly available in the public data repository at: Corel-5 k: https://www.kaggle.com/datasets/parhamsalar/corel5k reference number [33]. IAPR TC-12: https://www.kaggle.com/datasets/parhamsalar/iaprtc12 reference number [34]. Esp Game: https://www.kaggle.com/datasets/parhamsalar/espgame reference number [35].

Notes

You can find our implementation at: https://github.com/parham1998/Improving-Loss-Function-for-Deep-CNN-based-AIA.

References

Cheng, Q., Zhang, Q., Fu, P., Tu, C., Li, S.: A survey and analysis on automatic image annotation. Pattern Recognit 79, 242–259 (2018). https://doi.org/10.1016/j.patcog.2018.02.017
Article ADS Google Scholar
Tsoumakas, G., Katakis, I.: Multi-label classification. Int. J. Data Warehous. Min. 3(3), 1–13 (2007). https://doi.org/10.4018/jdwm.2007070101
Article Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011). https://doi.org/10.1007/s10994-011-5256-5
Article MathSciNet Google Scholar
Zhu, G., Yan, S., Ma, Y.: Image tag refinement towards low-rank, content-tag prior and error sparsity. In: MM’10—Proceedings of the ACM Multimedia 2010 International Conference. 2010, pp. 461–470. https://doi.org/10.1145/1873951.1874028.
Jin, J., Nakayama, H.: Annotation order matters: recurrent image annotator for arbitrary length image tagging. In: Proceedings—International Conference on Pattern Recognition. pp. 2452–2457, (2016). https://doi.org/10.1109/ICPR.2016.7900004.
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv Prepr. arXiv1609.02907. (2016)
Liu, X., Xu, Q., Wang, N.: A survey on deep neural network-based image captioning. Vis. Comput. 35(3), 445–470 (2019). https://doi.org/10.1007/s00371-018-1566-y
Article Google Scholar
Chen, Z.M., Wei, X.S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019, pp. 5172–5181. (2019). https://doi.org/10.1109/CVPR.2019.00532
Lotfi, F., Jamzad, M., Beigy, H.: Automatic image annotation using tag relations and graph convolutional networks. In: Proceedings of the 5th international conference on pattern recognition and image analysis, IPRIA 2021, pp. 1–6. (2021). https://doi.org/10.1109/IPRIA53572.2021.9483536
Szegedy, C. et al.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07–12, pp. 1–9. (2015). https://doi.org/10.1109/CVPR.2015.7298594
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016, pp. 770–778. (2016). https://doi.org/10.1109/CVPR.2016.90
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: 31st AAAI Conference on Artificial Intelligence, AAAI 2017, 2017, vol. 31, no. 1, pp. 4278–4284. https://doi.org/10.1609/aaai.v31i1.11231
Gong, Y., Jia, Y., Leung, T., Toshev, A., Ioffe, S.: Deep convolutional ranking for multilabel image annotation. arXiv Prepr. arXiv1312.4894. (2013)
Li, Y., Song, Y., Luo, J.: Improving pairwise ranking for multi-label image classification. In: Proceedings—30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017, pp. 1837–1845. (2017). https://doi.org/10.1109/CVPR.2017.199
Niu, Y., Lu, Z., Wen, J.R., Xiang, T., Chang, S.F.: Multi-modal multi-scale deep learning for large-scale image annotation. IEEE Trans. Image Process. 28(4), 1720–1731 (2019). https://doi.org/10.1109/TIP.2018.2881928
Article MathSciNet PubMed ADS Google Scholar
Ke, X., Zou, J., Niu, Y.: End-to-end automatic image annotation based on deep CNN and multi-label data augmentation. IEEE Trans. Multimed. 21(8), 2093–2106 (2019). https://doi.org/10.1109/TMM.2019.2895511
Article Google Scholar
Khatchatoorian, A.G., Jamzad, M.: Architecture to improve the accuracy of automatic image annotation systems. IET Comput. Vis. 14(5), 214–223 (2020). https://doi.org/10.1049/iet-cvi.2019.0500
Article Google Scholar
Lin, T. Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017, pp. 2999–3007. (2017). https://doi.org/10.1109/ICCV.2017.324
Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 119–126. (2003). https://doi.org/10.1145/860435.860459
Wang, M., Zhou, X.D., Zhang, J.Q., Xu, H.T., Le Shi, B.: Image auto-annotation via an extended generative language model. Ruan Jian Xue Bao/Journal Softw. 19(9), 2449–2460 (2008). https://doi.org/10.3724/SP.J.1001.2008.02449
Article MathSciNet Google Scholar
Makadia, A., Pavlovic, V., Kumar, S.: A New Baseline for Image Annotation. In: European Conference on Computer Vision, pp. 316–329. Springer, (2008). https://doi.org/10.1007/978-3-540-88690-7_24
Verma, Y., Jawahar, C.V.: Image annotation using metric learning in semantic neighbourhoods. In: lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7574 LNCS, no. PART 3, pp. 836–849. (2012). https://doi.org/10.1007/978-3-642-33712-3_60
Murthy, V.N., Can, E.F., Manmatha, R.: A hybrid model for automatic image annotation. In: ICMR 2014—Proceedings of the ACM international conference on multimedia retrieval 2014, pp. 369–376. (2014). https://doi.org/10.1145/2578726.2578774
Feng, L., Bhanu, B.: Semantic concept co-occurrence patterns for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38(4), 785–799 (2016). https://doi.org/10.1109/TPAMI.2015.2469281
Article PubMed Google Scholar
Wu, B., Lyu, S., Ghanem, B.: ML-MG: Multi-label learning with missing labels using a mixed graph. In: 2015 IEEE International Conference on Computer Vision (ICCV), vol. 2015 Inter, pp. 4157–4165. (2015). https://doi.org/10.1109/ICCV.2015.473
Wu, B., Liu, Z., Wang, S., Hu, B.G., Ji, Q.: Multi-label learning with missing labels. In: Proceedings—International Conference on Pattern Recognition, pp. 1964–1968. (2014). https://doi.org/10.1109/ICPR.2014.343
Murthy, V.N., Maji, S., Manmatha, R.: Automatic image annotation using deep learning representations. In: ICMR 2015—Proceedings of the 2015 ACM International Conference on Multimedia Retrieval, pp. 603–606. (2015). https://doi.org/10.1145/2671188.2749391
Xue, L., Jiang, D., Wang, R., Yang, J., Hu, M.: Learning semantic dependencies with channel correlation for multi-label classification. Vis. Comput. 36(7), 1325–1335 (2020). https://doi.org/10.1007/s00371-019-01731-5
Article Google Scholar
Wu, B., Chen, W., Sun, P., Liu, Ghanem, B., Lyu, S.: Tagging like humans: diverse and distinct image annotation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp. 7967–7975. (2018). https://doi.org/10.1109/CVPR.2018.00831
Ridnik, T., et al.: Asymmetric loss for multi-label classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 82–91. (2021). https://doi.org/10.1109/ICCV48922.2021.00015
Zhang, Y. et al.: Simple and robust loss design for multi-label learning with missing labels. arXiv Prepr. arXiv2112.07368. (2021). Available: http://arxiv.org/abs/2112.07368
Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1), 6 (2020). https://doi.org/10.1186/s12864-019-6413-7
Article PubMed PubMed Central Google Scholar
Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.A.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2353, pp. 97–112. (2002). https://doi.org/10.1007/3-540-47979-1_7
Grubinger, M.: Analysis and evaluation of visual information systems performance. (2007)
Von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: Conference on Human Factors in Computing Systems—Proceedings, pp. 319–326. (2004). https://doi.org/10.1145/985692.985733
Ridnik, T., Lawen, H., Noy, A., Ben, E., Sharir, B.G., Friedman, I.: TResNet: High performance GPU-dedicated architecture. In: Proceedings—2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021, pp. 1399–1408. (2021). https://doi.org/10.1109/WACV48630.2021.00144
Smith, L.N., Topin, N.: Super-convergence: very fast training of neural networks using large learning rates. In: Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, vol. 11006, p. 36. (2019). https://doi.org/10.1117/12.2520589
Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple Bernoulli relevance models for image and video annotation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1002–1009. (2004). https://doi.org/10.1109/cvpr.2004.1315274
Jing, X.Y., Wu, F., Li, Z., Hu, R., Zhang, D.: Multi-label dictionary learning for image annotation. IEEE Trans. Image Process. 25(6), 2712–2725 (2016). https://doi.org/10.1109/TIP.2016.2549459
Article MathSciNet ADS Google Scholar
Zhang, W., Hu, H., Hu, H.: Training visual-semantic embedding network for boosting automatic image annotation. Neural Process. Lett. 48(3), 1503–1519 (2018). https://doi.org/10.1007/s11063-017-9753-9
Article ADS Google Scholar
Khatchatoorian, A.G., Jamzad, M.: An image annotation rectifying method based on deep features. In: ACM International Conference Proceeding Series, pp. 88–92. (2018). https://doi.org/10.1145/3193025.3193035
Ma, Y., Liu, Y., Xie, Q., Li, L.: CNN-feature based automatic image annotation method. Multimed. Tools Appl. 78(3), 3767–3780 (2019). https://doi.org/10.1007/s11042-018-6038-x
Article Google Scholar
Li, Z., Lin, L., Zhang, C., Ma, H., Zhao, W., Shi, Z.: A Semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Trans. Multimed. Comput. Commun. Appl. 17(1), 1–23 (2021). https://doi.org/10.1145/3426974
Article Google Scholar

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
Ali Salar & Ali Ahmadi

Authors

Ali Salar
View author publications
You can also search for this author in PubMed Google Scholar
Ali Ahmadi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AS: Conceptualization, Methodology, Writing-original-draft, Formal analysis, Visualization. AA: Methodology, Validation, Resources, Writing-review & editing, Supervision.

Corresponding author

Correspondence to Ali Ahmadi.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Salar, A., Ahmadi, A. Improving loss function for deep convolutional neural network applied in automatic image annotation. Vis Comput 40, 1617–1629 (2024). https://doi.org/10.1007/s00371-023-02873-3

Download citation

Accepted: 12 April 2023
Published: 12 May 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00371-023-02873-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving loss function for deep convolutional neural network applied in automatic image annotation

Abstract

Access this article

Similar content being viewed by others

The image annotation algorithm using convolutional features from intermediate layer of deep learning

Automatic image annotation: the quirks and what works

A survey on automatic image annotation

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving loss function for deep convolutional neural network applied in automatic image annotation

Abstract

Access this article

Similar content being viewed by others

The image annotation algorithm using convolutional features from intermediate layer of deep learning

Automatic image annotation: the quirks and what works

A survey on automatic image annotation

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation