Captioning model based on meta-learning using prior-convergence knowledge for explainable images

Baek, Ji-Won; Chung, Kyungyong

doi:10.1007/s00779-021-01558-9

Captioning model based on meta-learning using prior-convergence knowledge for explainable images

Original Article
Published: 06 April 2021

Volume 27, pages 1191–1199, (2023)
Cite this article

Personal and Ubiquitous Computing Aims and scope Submit manuscript

326 Accesses
1 Altmetric
Explore all metrics

Abstract

Big data has a variety of data types, including image and text. In particular, image data-based research on face recognition and objection detection has been conducted in diverse areas. Deep learning needs a massive amount of data for learning a model accurately. The amount of data collected is different in each area, and thus it is likely to lack data for analysis through deep learning. Accordingly, it is necessary a method of learning a model effectively and predicting a result accurately with the use of a small amount of data. Also, captions and tags are generated to obtain image information. In the case of tagging, an image is expressed with words, while, in the case of captioning, a sentence can be created in connection with words. For this reason, it is possible to obtain image information in detail through captioning, compared to tagging. However, when a caption is created with words, there is the limitation of end-to-end to lower performance if labeled data are not sufficient. As a solution to the problem, meta-learning, in which a small amount of data can be used, is applied. This study proposes the captioning model based on meta-learning using prior-convergence knowledge for explainable images. The proposed method collects multimodal image data for predicting image information. From the collected data, the attributes representing object information and context information are used. After that, with the use of a small amount of data, meta-learning is applied in a bottom-up approach to creating a sentence for captioning. It can solve the problem caused by data shortage. Lastly, for the extraction of image features, LSTM for convolution network and captioning is established, and the basis for explanation is generated through the reverse operation. The generated basis is an image object. An appropriate explanation sentence is displayed in line with a particular object. Performance evaluation is conducted in two ways for accuracy. Firstly, BLEU score is evaluated according to whether there is meta-learning. Secondly, the proposed captioning model based on prior knowledge, RNN-based captioning model, and bidirectional RNN-based captioning model is evaluated in terms of BLEU score. Therefore, through the proposed method, LSTM’s bottom-up method reduces the cost of improving image resolution and solves the data shortage problem through meta-learning. In addition, it is possible to find the basis of image information using subtitles and to more accurately describe information about photos based on XAI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Advanced Generative Deep Learning Techniques for Accurate Captioning of Images

Article 29 April 2024

Enhance understanding and reasoning ability for image captioning

Article 12 May 2022

Object-Centric Unsupervised Image Captioning

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

References

Chung K, Jung H (2019) Knowledge-based dynamic cluster model for healthcare management using a convolutional neural network. Inf Technol Manag 21(1):41–50
Article Google Scholar
Fang W, Wang L, Ren P (2019) Tinier-YOLO: a real-time object detection method for constrained environments. IEEE Access 8:1935–1944
Article Google Scholar
Guo J, Tiwari G, Droppo J, Van Segbroeck M, Huang CW, Stolcke A, Maas R (2020). Efficient minimum word error rate training of RNN-transducer for end-to-end speech recognition. arXiv preprint arXiv:2007.13802.
Herdade S, Kappeler A, Boakye K, Soares J (2019) Image captioning: transforming objects into words. In: In Advances in Neural Information Processing Systems, pp 11137–11147
Google Scholar
Xie Y, Wang H, Yu B, Zhang C (2020) Secure collaborative few-shot learning. Knowl-Based Syst 203:106157
Article Google Scholar
Pierrard R, Poli JP, Hudelot C (2020) Spatial relation learning for explainable image classification and annotation in critical applications. Artif Intell 292:103434
Article MathSciNet MATH Google Scholar
Kim JC, Chung K (2019) Associative feature information extraction using text mining from health big data. Wirel Pers Commun 105(2):691–707
Article MathSciNet Google Scholar
Jung H, Chung K (2020) Social mining-based clustering process for big-data integration. J Ambient Intell Humaniz Comput:1–12
Qian K, Yu Z (2019) Domain adaptive dialog generation via meta learning. arXiv preprint arXiv:1906.03520.
Song H, Zhu J, Jiang Y (2020) avtmNet: adaptive visual-text merging network for image captioning. Comput Electr Eng 84:106630
Article Google Scholar
Li X, Jiang S (2019) Know more say less: image captioning based on scene graphs. IEEE Transac Multimed 21(8):2117–2130
Article Google Scholar
Xiao X, Wang L, Ding K, Xiang S, Pan C (2019) Deep hierarchical encoder–decoder network for image captioning. IEEE Transac Multimed 21(11):2942–2956
Article Google Scholar
Yao H, Wu X, Tao Z, Li Y, Ding B, Li R, Li Z. (2020) Automated relational meta-learning. arXiv preprint arXiv:2001.00745.
Fu K, Zhang T, Zhang Y, Yan M, Chang Z, Zhang Z, Sun X (2019) Meta-SSD: towards fast adaptation for few-shot object detection with meta-learning. IEEE Access 7:77597–77606
Article Google Scholar
Zhou F, Cao C, Zhong T, Geng J (2020) Learning meta-knowledge for few-shot image emotion recognition. Expert Syst Appl 168:114274. https://doi.org/10.1016/j.eswa.2020.114274
Article Google Scholar
Fontanini T, Lotti E, Donati L, Prati A (2020) MetalGAN: multi-domain label-less image synthesis using cGANs and meta-learning. Neural Netw 131:185–200
Article Google Scholar
Adadi A, Berrada M (2020) Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6:52138–52160
Article Google Scholar
Ha T, Sah YJ, Park Y, Lee S (2020) Examining the effects of power status of an explainable artificial intelligence system on users’ perceptions. Behav Inform Technol:1–13
Jiménez-Luna J, Grisoni F, Schneider G (2020) Drug discovery with explainable artificial intelligence. Nature Mach Intellig 2(10):573–584
Article Google Scholar
Alufaisan Y, Marusich LR, Bakdash JZ, Zhou Y, Kantarcioglu M (2020) Does explainable artificial intelligence improve human decision-making?, arXiv preprint arXiv:2006.11194.
Samek W, Wiegand T, Müller KR (2017) Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296.
Kim CM, Kim KH, Lee YS, Chung K, Park RC (2020) Real-time streaming image based PP2LFA-CRNN model for facial sentiment analysis. IEEE Access 8:199586–199602
Article Google Scholar
AI Hub, https://www.aihub.or.kr/.
Pan Y, Yao T, Li Y, Mei T (2020) X-linear attention networks for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10971–10980
Google Scholar
Liu Y, Xiong H, He Z, Zhang J, Wu H, Wang H, Zong C (2019) End-to-end speech translation with knowledge distillation. arXiv preprint arXiv:1904.08075.
Khodadadeh S, Boloni L, Shah M (2019) Unsupervised meta-learning for few-shot image classification. Adv Neural Inf Proces Syst 32:10132–10142
Google Scholar
Tosun AB, Pullara F, Becich MJ, Taylor D, Fine JL, Chennubhotla SC (2020) Explainable AI (xAI) for anatomic pathology. Adv Anat Pathol 27(4):241–250
Article Google Scholar
Post M (2018) A call for clarity in reporting BLEU scores. arXiv preprint arXiv:1804.08771.
Wen TH, Young S (2020) Recurrent neural network language generation for spoken dialogue systems. Comput Speech Lang 63:101017
Article Google Scholar
Zhao G, Fu H, Song R, Sakai T, Chen Z, Xie X, Qian X (2019) Personalized reason generation for explainable song recommendation. ACM Transac Intellig Syst Technol (TIST) 10(4):1–21
Article Google Scholar
Kim JC, Chung K (2020) Discovery of knowledge of associative relations using opinion mining based on a health platform. Pers Ubiquit Comput 24(5):583–593
Article Google Scholar
Choi SY, Chung K (2020) Knowledge process of health big data using mapreduce-based associative mining. Pers Ubiquit Comput 24(5):571–581
Article Google Scholar
Kim JC, Chung K (2020) Neural-network based adaptive context prediction model for ambient intelligence. J Ambient Intell Humaniz Comput 11(4):1451–1458
Article Google Scholar
Kim JC, Chung K (2020) Knowledge-based hybrid decision model using neural network for nutrition management. Inf Technol Manag 21(1):29–39
Article Google Scholar
Chung K, Park RC (2020) P2P based open health cloud for medicines management. Peer-to-Peer Network Appl 13(2):610–622
Article Google Scholar
Shin DH, Park RC, Chung K (2020) Decision boundary-based anomaly detection model using improved AnoGAN from ECG data. IEEE Access 8:108664–108674
Article Google Scholar
Shin DH, Chung K, Park RC (2020) Prediction of traffic congestion based on LSTM through correction of missing temporal and spatial data. IEEE Access 8:150784–150796
Article Google Scholar
Kim JC, Chung K (2020) Multi-modal stacked denoising autoencoder for handling missing data in healthcare big data. IEEE Access 8:104933–104943
Article Google Scholar

Download references

Acknowledgements

This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program (IITP-2018-0-01405) supervised by the IITP(Institute for Information & Communications Technology Planning & Eval uation). Additionally, this work was supported by Kyonggi University‘s Graduate Research Assistantship 2021.

Author information

Authors and Affiliations

Department of Computer Science, Kyonggi University, 154–42, Gwanggyosan-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do, 16227, South Korea
Ji-Won Baek
Division of AI Computer Science and Engineering, Kyonggi University, 154-42, Gwanggyosan-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do, 16227, South Korea
Kyungyong Chung

Authors

Ji-Won Baek
View author publications
You can also search for this author inPubMed Google Scholar
Kyungyong Chung
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Kyungyong Chung.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baek, JW., Chung, K. Captioning model based on meta-learning using prior-convergence knowledge for explainable images. Pers Ubiquit Comput 27, 1191–1199 (2023). https://doi.org/10.1007/s00779-021-01558-9

Download citation

Received: 01 January 2021
Accepted: 20 March 2021
Published: 06 April 2021
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00779-021-01558-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Captioning model based on meta-learning using prior-convergence knowledge for explainable images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Advanced Generative Deep Learning Techniques for Accurate Captioning of Images

Enhance understanding and reasoning ability for image captioning

Object-Centric Unsupervised Image Captioning

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now