Skip to main content
Log in

Improving loss function for deep convolutional neural network applied in automatic image annotation

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Automatic image annotation (AIA) is a mechanism for describing the visual content of an image with a list of semantic labels. Typically, there is a massive imbalance between positive and negative tags in a picture—in other words, an image includes much fewer positive labels than negative ones. This imbalance can negatively affect the optimization process and diminish the emphasis on gradients from positive labels during training. Although traditional annotation models mainly focus on model structure design, we propose a novel unsymmetrical loss function for a deep convolutional neural network (CNN) that performs differently on positives and negatives, which leads to a reduction in the loss contribution from negative labels and also highlights the contribution of positive ones. During the annotation process, we specify a threshold for each label separately based on the Matthews correlation coefficient (MCC). Extensive experiments on high-vocabulary datasets like Corel 5k, IAPR TC-12, and Esp Game reveal that despite ignoring the semantic relationships between labels, our suggested approach achieves remarkable results compared to the state-of-the-art automatic image annotation models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The data that support the findings of this study are openly available in the public data repository at: Corel-5 k: https://www.kaggle.com/datasets/parhamsalar/corel5k reference number [33]. IAPR TC-12: https://www.kaggle.com/datasets/parhamsalar/iaprtc12 reference number [34]. Esp Game: https://www.kaggle.com/datasets/parhamsalar/espgame reference number [35].

Notes

  1. You can find our implementation at: https://github.com/parham1998/Improving-Loss-Function-for-Deep-CNN-based-AIA.

References

  1. Cheng, Q., Zhang, Q., Fu, P., Tu, C., Li, S.: A survey and analysis on automatic image annotation. Pattern Recognit 79, 242–259 (2018). https://doi.org/10.1016/j.patcog.2018.02.017

    Article  ADS  Google Scholar 

  2. Tsoumakas, G., Katakis, I.: Multi-label classification. Int. J. Data Warehous. Min. 3(3), 1–13 (2007). https://doi.org/10.4018/jdwm.2007070101

    Article  Google Scholar 

  3. Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011). https://doi.org/10.1007/s10994-011-5256-5

    Article  MathSciNet  Google Scholar 

  4. Zhu, G., Yan, S., Ma, Y.: Image tag refinement towards low-rank, content-tag prior and error sparsity. In: MM’10—Proceedings of the ACM Multimedia 2010 International Conference. 2010, pp. 461–470. https://doi.org/10.1145/1873951.1874028.

  5. Jin, J., Nakayama, H.: Annotation order matters: recurrent image annotator for arbitrary length image tagging. In: Proceedings—International Conference on Pattern Recognition. pp. 2452–2457, (2016). https://doi.org/10.1109/ICPR.2016.7900004.

  6. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv Prepr. arXiv1609.02907. (2016)

  7. Liu, X., Xu, Q., Wang, N.: A survey on deep neural network-based image captioning. Vis. Comput. 35(3), 445–470 (2019). https://doi.org/10.1007/s00371-018-1566-y

    Article  Google Scholar 

  8. Chen, Z.M., Wei, X.S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019, pp. 5172–5181. (2019). https://doi.org/10.1109/CVPR.2019.00532

  9. Lotfi, F., Jamzad, M., Beigy, H.: Automatic image annotation using tag relations and graph convolutional networks. In: Proceedings of the 5th international conference on pattern recognition and image analysis, IPRIA 2021, pp. 1–6. (2021). https://doi.org/10.1109/IPRIA53572.2021.9483536

  10. Szegedy, C. et al.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07–12, pp. 1–9. (2015). https://doi.org/10.1109/CVPR.2015.7298594

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016, pp. 770–778. (2016). https://doi.org/10.1109/CVPR.2016.90

  12. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: 31st AAAI Conference on Artificial Intelligence, AAAI 2017, 2017, vol. 31, no. 1, pp. 4278–4284. https://doi.org/10.1609/aaai.v31i1.11231

  13. Gong, Y., Jia, Y., Leung, T., Toshev, A., Ioffe, S.: Deep convolutional ranking for multilabel image annotation. arXiv Prepr. arXiv1312.4894. (2013)

  14. Li, Y., Song, Y., Luo, J.: Improving pairwise ranking for multi-label image classification. In: Proceedings—30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017, pp. 1837–1845. (2017). https://doi.org/10.1109/CVPR.2017.199

  15. Niu, Y., Lu, Z., Wen, J.R., Xiang, T., Chang, S.F.: Multi-modal multi-scale deep learning for large-scale image annotation. IEEE Trans. Image Process. 28(4), 1720–1731 (2019). https://doi.org/10.1109/TIP.2018.2881928

    Article  MathSciNet  PubMed  ADS  Google Scholar 

  16. Ke, X., Zou, J., Niu, Y.: End-to-end automatic image annotation based on deep CNN and multi-label data augmentation. IEEE Trans. Multimed. 21(8), 2093–2106 (2019). https://doi.org/10.1109/TMM.2019.2895511

    Article  Google Scholar 

  17. Khatchatoorian, A.G., Jamzad, M.: Architecture to improve the accuracy of automatic image annotation systems. IET Comput. Vis. 14(5), 214–223 (2020). https://doi.org/10.1049/iet-cvi.2019.0500

    Article  Google Scholar 

  18. Lin, T. Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017, pp. 2999–3007. (2017). https://doi.org/10.1109/ICCV.2017.324

  19. Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 119–126. (2003). https://doi.org/10.1145/860435.860459

  20. Wang, M., Zhou, X.D., Zhang, J.Q., Xu, H.T., Le Shi, B.: Image auto-annotation via an extended generative language model. Ruan Jian Xue Bao/Journal Softw. 19(9), 2449–2460 (2008). https://doi.org/10.3724/SP.J.1001.2008.02449

    Article  MathSciNet  Google Scholar 

  21. Makadia, A., Pavlovic, V., Kumar, S.: A New Baseline for Image Annotation. In: European Conference on Computer Vision, pp. 316–329. Springer, (2008). https://doi.org/10.1007/978-3-540-88690-7_24

  22. Verma, Y., Jawahar, C.V.: Image annotation using metric learning in semantic neighbourhoods. In: lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7574 LNCS, no. PART 3, pp. 836–849. (2012). https://doi.org/10.1007/978-3-642-33712-3_60

  23. Murthy, V.N., Can, E.F., Manmatha, R.: A hybrid model for automatic image annotation. In: ICMR 2014—Proceedings of the ACM international conference on multimedia retrieval 2014, pp. 369–376. (2014). https://doi.org/10.1145/2578726.2578774

  24. Feng, L., Bhanu, B.: Semantic concept co-occurrence patterns for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38(4), 785–799 (2016). https://doi.org/10.1109/TPAMI.2015.2469281

    Article  PubMed  Google Scholar 

  25. Wu, B., Lyu, S., Ghanem, B.: ML-MG: Multi-label learning with missing labels using a mixed graph. In: 2015 IEEE International Conference on Computer Vision (ICCV), vol. 2015 Inter, pp. 4157–4165. (2015). https://doi.org/10.1109/ICCV.2015.473

  26. Wu, B., Liu, Z., Wang, S., Hu, B.G., Ji, Q.: Multi-label learning with missing labels. In: Proceedings—International Conference on Pattern Recognition, pp. 1964–1968. (2014). https://doi.org/10.1109/ICPR.2014.343

  27. Murthy, V.N., Maji, S., Manmatha, R.: Automatic image annotation using deep learning representations. In: ICMR 2015—Proceedings of the 2015 ACM International Conference on Multimedia Retrieval, pp. 603–606. (2015). https://doi.org/10.1145/2671188.2749391

  28. Xue, L., Jiang, D., Wang, R., Yang, J., Hu, M.: Learning semantic dependencies with channel correlation for multi-label classification. Vis. Comput. 36(7), 1325–1335 (2020). https://doi.org/10.1007/s00371-019-01731-5

    Article  Google Scholar 

  29. Wu, B., Chen, W., Sun, P., Liu, Ghanem, B., Lyu, S.: Tagging like humans: diverse and distinct image annotation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp. 7967–7975. (2018). https://doi.org/10.1109/CVPR.2018.00831

  30. Ridnik, T., et al.: Asymmetric loss for multi-label classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 82–91. (2021). https://doi.org/10.1109/ICCV48922.2021.00015

  31. Zhang, Y. et al.: Simple and robust loss design for multi-label learning with missing labels. arXiv Prepr. arXiv2112.07368. (2021). Available: http://arxiv.org/abs/2112.07368

  32. Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1), 6 (2020). https://doi.org/10.1186/s12864-019-6413-7

    Article  PubMed  PubMed Central  Google Scholar 

  33. Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.A.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2353, pp. 97–112. (2002). https://doi.org/10.1007/3-540-47979-1_7

  34. Grubinger, M.: Analysis and evaluation of visual information systems performance. (2007)

  35. Von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: Conference on Human Factors in Computing Systems—Proceedings, pp. 319–326. (2004). https://doi.org/10.1145/985692.985733

  36. Ridnik, T., Lawen, H., Noy, A., Ben, E., Sharir, B.G., Friedman, I.: TResNet: High performance GPU-dedicated architecture. In: Proceedings—2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021, pp. 1399–1408. (2021). https://doi.org/10.1109/WACV48630.2021.00144

  37. Smith, L.N., Topin, N.: Super-convergence: very fast training of neural networks using large learning rates. In: Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, vol. 11006, p. 36. (2019). https://doi.org/10.1117/12.2520589

  38. Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple Bernoulli relevance models for image and video annotation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1002–1009. (2004). https://doi.org/10.1109/cvpr.2004.1315274

  39. Jing, X.Y., Wu, F., Li, Z., Hu, R., Zhang, D.: Multi-label dictionary learning for image annotation. IEEE Trans. Image Process. 25(6), 2712–2725 (2016). https://doi.org/10.1109/TIP.2016.2549459

    Article  MathSciNet  ADS  Google Scholar 

  40. Zhang, W., Hu, H., Hu, H.: Training visual-semantic embedding network for boosting automatic image annotation. Neural Process. Lett. 48(3), 1503–1519 (2018). https://doi.org/10.1007/s11063-017-9753-9

    Article  ADS  Google Scholar 

  41. Khatchatoorian, A.G., Jamzad, M.: An image annotation rectifying method based on deep features. In: ACM International Conference Proceeding Series, pp. 88–92. (2018). https://doi.org/10.1145/3193025.3193035

  42. Ma, Y., Liu, Y., Xie, Q., Li, L.: CNN-feature based automatic image annotation method. Multimed. Tools Appl. 78(3), 3767–3780 (2019). https://doi.org/10.1007/s11042-018-6038-x

    Article  Google Scholar 

  43. Li, Z., Lin, L., Zhang, C., Ma, H., Zhao, W., Shi, Z.: A Semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Trans. Multimed. Comput. Commun. Appl. 17(1), 1–23 (2021). https://doi.org/10.1145/3426974

    Article  Google Scholar 

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Contributions

AS: Conceptualization, Methodology, Writing-original-draft, Formal analysis, Visualization. AA: Methodology, Validation, Resources, Writing-review & editing, Supervision.

Corresponding author

Correspondence to Ali Ahmadi.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salar, A., Ahmadi, A. Improving loss function for deep convolutional neural network applied in automatic image annotation. Vis Comput 40, 1617–1629 (2024). https://doi.org/10.1007/s00371-023-02873-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02873-3

Keywords

Navigation