Abstract
Big data has a variety of data types, including image and text. In particular, image data-based research on face recognition and objection detection has been conducted in diverse areas. Deep learning needs a massive amount of data for learning a model accurately. The amount of data collected is different in each area, and thus it is likely to lack data for analysis through deep learning. Accordingly, it is necessary a method of learning a model effectively and predicting a result accurately with the use of a small amount of data. Also, captions and tags are generated to obtain image information. In the case of tagging, an image is expressed with words, while, in the case of captioning, a sentence can be created in connection with words. For this reason, it is possible to obtain image information in detail through captioning, compared to tagging. However, when a caption is created with words, there is the limitation of end-to-end to lower performance if labeled data are not sufficient. As a solution to the problem, meta-learning, in which a small amount of data can be used, is applied. This study proposes the captioning model based on meta-learning using prior-convergence knowledge for explainable images. The proposed method collects multimodal image data for predicting image information. From the collected data, the attributes representing object information and context information are used. After that, with the use of a small amount of data, meta-learning is applied in a bottom-up approach to creating a sentence for captioning. It can solve the problem caused by data shortage. Lastly, for the extraction of image features, LSTM for convolution network and captioning is established, and the basis for explanation is generated through the reverse operation. The generated basis is an image object. An appropriate explanation sentence is displayed in line with a particular object. Performance evaluation is conducted in two ways for accuracy. Firstly, BLEU score is evaluated according to whether there is meta-learning. Secondly, the proposed captioning model based on prior knowledge, RNN-based captioning model, and bidirectional RNN-based captioning model is evaluated in terms of BLEU score. Therefore, through the proposed method, LSTM’s bottom-up method reduces the cost of improving image resolution and solves the data shortage problem through meta-learning. In addition, it is possible to find the basis of image information using subtitles and to more accurately describe information about photos based on XAI.







Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Chung K, Jung H (2019) Knowledge-based dynamic cluster model for healthcare management using a convolutional neural network. Inf Technol Manag 21(1):41–50
Fang W, Wang L, Ren P (2019) Tinier-YOLO: a real-time object detection method for constrained environments. IEEE Access 8:1935–1944
Guo J, Tiwari G, Droppo J, Van Segbroeck M, Huang CW, Stolcke A, Maas R (2020). Efficient minimum word error rate training of RNN-transducer for end-to-end speech recognition. arXiv preprint arXiv:2007.13802.
Herdade S, Kappeler A, Boakye K, Soares J (2019) Image captioning: transforming objects into words. In: In Advances in Neural Information Processing Systems, pp 11137–11147
Xie Y, Wang H, Yu B, Zhang C (2020) Secure collaborative few-shot learning. Knowl-Based Syst 203:106157
Pierrard R, Poli JP, Hudelot C (2020) Spatial relation learning for explainable image classification and annotation in critical applications. Artif Intell 292:103434
Kim JC, Chung K (2019) Associative feature information extraction using text mining from health big data. Wirel Pers Commun 105(2):691–707
Jung H, Chung K (2020) Social mining-based clustering process for big-data integration. J Ambient Intell Humaniz Comput:1–12
Qian K, Yu Z (2019) Domain adaptive dialog generation via meta learning. arXiv preprint arXiv:1906.03520.
Song H, Zhu J, Jiang Y (2020) avtmNet: adaptive visual-text merging network for image captioning. Comput Electr Eng 84:106630
Li X, Jiang S (2019) Know more say less: image captioning based on scene graphs. IEEE Transac Multimed 21(8):2117–2130
Xiao X, Wang L, Ding K, Xiang S, Pan C (2019) Deep hierarchical encoder–decoder network for image captioning. IEEE Transac Multimed 21(11):2942–2956
Yao H, Wu X, Tao Z, Li Y, Ding B, Li R, Li Z. (2020) Automated relational meta-learning. arXiv preprint arXiv:2001.00745.
Fu K, Zhang T, Zhang Y, Yan M, Chang Z, Zhang Z, Sun X (2019) Meta-SSD: towards fast adaptation for few-shot object detection with meta-learning. IEEE Access 7:77597–77606
Zhou F, Cao C, Zhong T, Geng J (2020) Learning meta-knowledge for few-shot image emotion recognition. Expert Syst Appl 168:114274. https://doi.org/10.1016/j.eswa.2020.114274
Fontanini T, Lotti E, Donati L, Prati A (2020) MetalGAN: multi-domain label-less image synthesis using cGANs and meta-learning. Neural Netw 131:185–200
Adadi A, Berrada M (2020) Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6:52138–52160
Ha T, Sah YJ, Park Y, Lee S (2020) Examining the effects of power status of an explainable artificial intelligence system on users’ perceptions. Behav Inform Technol:1–13
Jiménez-Luna J, Grisoni F, Schneider G (2020) Drug discovery with explainable artificial intelligence. Nature Mach Intellig 2(10):573–584
Alufaisan Y, Marusich LR, Bakdash JZ, Zhou Y, Kantarcioglu M (2020) Does explainable artificial intelligence improve human decision-making?, arXiv preprint arXiv:2006.11194.
Samek W, Wiegand T, Müller KR (2017) Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296.
Kim CM, Kim KH, Lee YS, Chung K, Park RC (2020) Real-time streaming image based PP2LFA-CRNN model for facial sentiment analysis. IEEE Access 8:199586–199602
AI Hub, https://www.aihub.or.kr/.
Pan Y, Yao T, Li Y, Mei T (2020) X-linear attention networks for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10971–10980
Liu Y, Xiong H, He Z, Zhang J, Wu H, Wang H, Zong C (2019) End-to-end speech translation with knowledge distillation. arXiv preprint arXiv:1904.08075.
Khodadadeh S, Boloni L, Shah M (2019) Unsupervised meta-learning for few-shot image classification. Adv Neural Inf Proces Syst 32:10132–10142
Tosun AB, Pullara F, Becich MJ, Taylor D, Fine JL, Chennubhotla SC (2020) Explainable AI (xAI) for anatomic pathology. Adv Anat Pathol 27(4):241–250
Post M (2018) A call for clarity in reporting BLEU scores. arXiv preprint arXiv:1804.08771.
Wen TH, Young S (2020) Recurrent neural network language generation for spoken dialogue systems. Comput Speech Lang 63:101017
Zhao G, Fu H, Song R, Sakai T, Chen Z, Xie X, Qian X (2019) Personalized reason generation for explainable song recommendation. ACM Transac Intellig Syst Technol (TIST) 10(4):1–21
Kim JC, Chung K (2020) Discovery of knowledge of associative relations using opinion mining based on a health platform. Pers Ubiquit Comput 24(5):583–593
Choi SY, Chung K (2020) Knowledge process of health big data using mapreduce-based associative mining. Pers Ubiquit Comput 24(5):571–581
Kim JC, Chung K (2020) Neural-network based adaptive context prediction model for ambient intelligence. J Ambient Intell Humaniz Comput 11(4):1451–1458
Kim JC, Chung K (2020) Knowledge-based hybrid decision model using neural network for nutrition management. Inf Technol Manag 21(1):29–39
Chung K, Park RC (2020) P2P based open health cloud for medicines management. Peer-to-Peer Network Appl 13(2):610–622
Shin DH, Park RC, Chung K (2020) Decision boundary-based anomaly detection model using improved AnoGAN from ECG data. IEEE Access 8:108664–108674
Shin DH, Chung K, Park RC (2020) Prediction of traffic congestion based on LSTM through correction of missing temporal and spatial data. IEEE Access 8:150784–150796
Kim JC, Chung K (2020) Multi-modal stacked denoising autoencoder for handling missing data in healthcare big data. IEEE Access 8:104933–104943
Acknowledgements
This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program (IITP-2018-0-01405) supervised by the IITP(Institute for Information & Communications Technology Planning & Eval uation). Additionally, this work was supported by Kyonggi University‘s Graduate Research Assistantship 2021.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Baek, JW., Chung, K. Captioning model based on meta-learning using prior-convergence knowledge for explainable images. Pers Ubiquit Comput 27, 1191–1199 (2023). https://doi.org/10.1007/s00779-021-01558-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-021-01558-9