Skip to main content

Advertisement

Log in

Captioning model based on meta-learning using prior-convergence knowledge for explainable images

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

Big data has a variety of data types, including image and text. In particular, image data-based research on face recognition and objection detection has been conducted in diverse areas. Deep learning needs a massive amount of data for learning a model accurately. The amount of data collected is different in each area, and thus it is likely to lack data for analysis through deep learning. Accordingly, it is necessary a method of learning a model effectively and predicting a result accurately with the use of a small amount of data. Also, captions and tags are generated to obtain image information. In the case of tagging, an image is expressed with words, while, in the case of captioning, a sentence can be created in connection with words. For this reason, it is possible to obtain image information in detail through captioning, compared to tagging. However, when a caption is created with words, there is the limitation of end-to-end to lower performance if labeled data are not sufficient. As a solution to the problem, meta-learning, in which a small amount of data can be used, is applied. This study proposes the captioning model based on meta-learning using prior-convergence knowledge for explainable images. The proposed method collects multimodal image data for predicting image information. From the collected data, the attributes representing object information and context information are used. After that, with the use of a small amount of data, meta-learning is applied in a bottom-up approach to creating a sentence for captioning. It can solve the problem caused by data shortage. Lastly, for the extraction of image features, LSTM for convolution network and captioning is established, and the basis for explanation is generated through the reverse operation. The generated basis is an image object. An appropriate explanation sentence is displayed in line with a particular object. Performance evaluation is conducted in two ways for accuracy. Firstly, BLEU score is evaluated according to whether there is meta-learning. Secondly, the proposed captioning model based on prior knowledge, RNN-based captioning model, and bidirectional RNN-based captioning model is evaluated in terms of BLEU score. Therefore, through the proposed method, LSTM’s bottom-up method reduces the cost of improving image resolution and solves the data shortage problem through meta-learning. In addition, it is possible to find the basis of image information using subtitles and to more accurately describe information about photos based on XAI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Chung K, Jung H (2019) Knowledge-based dynamic cluster model for healthcare management using a convolutional neural network. Inf Technol Manag 21(1):41–50

    Article  Google Scholar 

  2. Fang W, Wang L, Ren P (2019) Tinier-YOLO: a real-time object detection method for constrained environments. IEEE Access 8:1935–1944

    Article  Google Scholar 

  3. Guo J, Tiwari G, Droppo J, Van Segbroeck M, Huang CW, Stolcke A, Maas R (2020). Efficient minimum word error rate training of RNN-transducer for end-to-end speech recognition. arXiv preprint arXiv:2007.13802.

  4. Herdade S, Kappeler A, Boakye K, Soares J (2019) Image captioning: transforming objects into words. In: In Advances in Neural Information Processing Systems, pp 11137–11147

    Google Scholar 

  5. Xie Y, Wang H, Yu B, Zhang C (2020) Secure collaborative few-shot learning. Knowl-Based Syst 203:106157

    Article  Google Scholar 

  6. Pierrard R, Poli JP, Hudelot C (2020) Spatial relation learning for explainable image classification and annotation in critical applications. Artif Intell 292:103434

    Article  MathSciNet  MATH  Google Scholar 

  7. Kim JC, Chung K (2019) Associative feature information extraction using text mining from health big data. Wirel Pers Commun 105(2):691–707

    Article  MathSciNet  Google Scholar 

  8. Jung H, Chung K (2020) Social mining-based clustering process for big-data integration. J Ambient Intell Humaniz Comput:1–12

  9. Qian K, Yu Z (2019) Domain adaptive dialog generation via meta learning. arXiv preprint arXiv:1906.03520.

  10. Song H, Zhu J, Jiang Y (2020) avtmNet: adaptive visual-text merging network for image captioning. Comput Electr Eng 84:106630

    Article  Google Scholar 

  11. Li X, Jiang S (2019) Know more say less: image captioning based on scene graphs. IEEE Transac Multimed 21(8):2117–2130

    Article  Google Scholar 

  12. Xiao X, Wang L, Ding K, Xiang S, Pan C (2019) Deep hierarchical encoder–decoder network for image captioning. IEEE Transac Multimed 21(11):2942–2956

    Article  Google Scholar 

  13. Yao H, Wu X, Tao Z, Li Y, Ding B, Li R, Li Z. (2020) Automated relational meta-learning. arXiv preprint arXiv:2001.00745.

  14. Fu K, Zhang T, Zhang Y, Yan M, Chang Z, Zhang Z, Sun X (2019) Meta-SSD: towards fast adaptation for few-shot object detection with meta-learning. IEEE Access 7:77597–77606

    Article  Google Scholar 

  15. Zhou F, Cao C, Zhong T, Geng J (2020) Learning meta-knowledge for few-shot image emotion recognition. Expert Syst Appl 168:114274. https://doi.org/10.1016/j.eswa.2020.114274

    Article  Google Scholar 

  16. Fontanini T, Lotti E, Donati L, Prati A (2020) MetalGAN: multi-domain label-less image synthesis using cGANs and meta-learning. Neural Netw 131:185–200

    Article  Google Scholar 

  17. Adadi A, Berrada M (2020) Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6:52138–52160

    Article  Google Scholar 

  18. Ha T, Sah YJ, Park Y, Lee S (2020) Examining the effects of power status of an explainable artificial intelligence system on users’ perceptions. Behav Inform Technol:1–13

  19. Jiménez-Luna J, Grisoni F, Schneider G (2020) Drug discovery with explainable artificial intelligence. Nature Mach Intellig 2(10):573–584

    Article  Google Scholar 

  20. Alufaisan Y, Marusich LR, Bakdash JZ, Zhou Y, Kantarcioglu M (2020) Does explainable artificial intelligence improve human decision-making?, arXiv preprint arXiv:2006.11194.

  21. Samek W, Wiegand T, Müller KR (2017) Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296.

  22. Kim CM, Kim KH, Lee YS, Chung K, Park RC (2020) Real-time streaming image based PP2LFA-CRNN model for facial sentiment analysis. IEEE Access 8:199586–199602

    Article  Google Scholar 

  23. AI Hub, https://www.aihub.or.kr/.

  24. Pan Y, Yao T, Li Y, Mei T (2020) X-linear attention networks for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10971–10980

    Google Scholar 

  25. Liu Y, Xiong H, He Z, Zhang J, Wu H, Wang H, Zong C (2019) End-to-end speech translation with knowledge distillation. arXiv preprint arXiv:1904.08075.

  26. Khodadadeh S, Boloni L, Shah M (2019) Unsupervised meta-learning for few-shot image classification. Adv Neural Inf Proces Syst 32:10132–10142

    Google Scholar 

  27. Tosun AB, Pullara F, Becich MJ, Taylor D, Fine JL, Chennubhotla SC (2020) Explainable AI (xAI) for anatomic pathology. Adv Anat Pathol 27(4):241–250

    Article  Google Scholar 

  28. Post M (2018) A call for clarity in reporting BLEU scores. arXiv preprint arXiv:1804.08771.

  29. Wen TH, Young S (2020) Recurrent neural network language generation for spoken dialogue systems. Comput Speech Lang 63:101017

    Article  Google Scholar 

  30. Zhao G, Fu H, Song R, Sakai T, Chen Z, Xie X, Qian X (2019) Personalized reason generation for explainable song recommendation. ACM Transac Intellig Syst Technol (TIST) 10(4):1–21

    Article  Google Scholar 

  31. Kim JC, Chung K (2020) Discovery of knowledge of associative relations using opinion mining based on a health platform. Pers Ubiquit Comput 24(5):583–593

    Article  Google Scholar 

  32. Choi SY, Chung K (2020) Knowledge process of health big data using mapreduce-based associative mining. Pers Ubiquit Comput 24(5):571–581

    Article  Google Scholar 

  33. Kim JC, Chung K (2020) Neural-network based adaptive context prediction model for ambient intelligence. J Ambient Intell Humaniz Comput 11(4):1451–1458

    Article  Google Scholar 

  34. Kim JC, Chung K (2020) Knowledge-based hybrid decision model using neural network for nutrition management. Inf Technol Manag 21(1):29–39

    Article  Google Scholar 

  35. Chung K, Park RC (2020) P2P based open health cloud for medicines management. Peer-to-Peer Network Appl 13(2):610–622

    Article  Google Scholar 

  36. Shin DH, Park RC, Chung K (2020) Decision boundary-based anomaly detection model using improved AnoGAN from ECG data. IEEE Access 8:108664–108674

    Article  Google Scholar 

  37. Shin DH, Chung K, Park RC (2020) Prediction of traffic congestion based on LSTM through correction of missing temporal and spatial data. IEEE Access 8:150784–150796

    Article  Google Scholar 

  38. Kim JC, Chung K (2020) Multi-modal stacked denoising autoencoder for handling missing data in healthcare big data. IEEE Access 8:104933–104943

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program (IITP-2018-0-01405) supervised by the IITP(Institute for Information & Communications Technology Planning & Eval uation). Additionally, this work was supported by Kyonggi University‘s Graduate Research Assistantship 2021.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyungyong Chung.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baek, JW., Chung, K. Captioning model based on meta-learning using prior-convergence knowledge for explainable images. Pers Ubiquit Comput 27, 1191–1199 (2023). https://doi.org/10.1007/s00779-021-01558-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-021-01558-9

Keywords