Benefit from AMR: Image Captioning with Explicit Relations and Endogenous Knowledge

Chen, Feng; Li, Xinyi; Tang, Jintao; Li, Shasha; Wang, Ting

doi:10.1007/978-981-97-2390-4_25

Feng Chen^12,13,
Xinyi Li¹³,
Jintao Tang¹³,
Shasha Li¹³ &
…
Ting Wang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14332))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

56 Accesses

Abstract

Recent advanced image captioning methods mostly explore implicit relationships among objects by object-based visual feature modeling, while failing to capture the explicit relations and achieve semantic association. To tackle these problems, we present a novel method based on Abstract Meaning Representation (AMR) in this paper. Specifically, in addition to implicit relationship modeling of visual features, we design an AMR generator to extract explicit relations of images and further model these relations during generation. Besides, we construct an AMR-based endogenous knowledge graph, which helps extract prior knowledge for semantic association, strengthening the semantic expression ability of the captioning model without any external resources. Extensive experiments are conducted on the public MS COCO dataset, and results show that the AMR-based explicit semantic features and the associated semantic features can further boost image captioning to generate higher-quality captions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: In IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)
Google Scholar
Banarescu, L., et al.: Abstract meaning representation for sembanking. In: LAW@ACL (2013)
Google Scholar
Chatterjee, R., Weller, M., Negri, M., Turchi, M.: Exploring the planet of the apes: a comparative study of state-of-the-art methods for MT automatic post-editing. In: In 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2, pp. 156–161 (2015)
Google Scholar
Chen, F., Xie, S., Li, X., Li, S., Tang, J., Wang, T.: What topics do images say: a neural image captioning model with topic representation. In: In IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 447–452. IEEE (2019)
Google Scholar
Chen, L., et al.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: In IEEE Conference on Computer Vision and Pattern Recognition, pp. 6298–6306. IEEE (2017)
Google Scholar
Chen, S., Jin, Q., Wang, P., Wu, Q.: Say as you wish: fine-grained control of image caption generation with abstract scene graphs. In: In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9962–9971 (2020)
Google Scholar
Dong, G., Zhang, X., Lan, L. et al. Label guided correlation hashing for large-scale cross-modal retrieval. Multimed. Tools Appl. 78, 30895–30922 (2019). https://doi.org/10.1007/s11042-019-7192-5
Feng, Y., Chen, X., Lin, B.Y., Wang, P., Yan, J., Ren, X.: Scalable multi-hop relational reasoning for knowledge-aware question answering. In: In Conference on Empirical Methods in Natural Language Processing (2020)
Google Scholar
Flanigan, J., Thomson, S., Carbonell, J., Dyer, C., Smith, N.A.: A discriminative graph-based parser for the abstract meaning representation. In: In Annual Meeting of the Association for Computational Linguistics (2014)
Google Scholar
Gao, L., Fan, K., Song, J., Liu, X., Xu, X., Shen, H.T.: Deliberate attention networks for image captioning. In: In AAAI Conference on Artificial Intelligence (2019)
Google Scholar
Gu, J., Cai, J., Wang, G., Chen, T.: Stack-captioning: Coarse-to-fine learning for image captioning. In: AAAI Conference on Artificial Intelligence (2018)
Google Scholar
He, C., Hu, H.: Image captioning with visual-semantic double attention. ACM Trans. Multimed. Comput. Commun. Appl. 15(1), 26 (2019)
Article Google Scholar
Huang, F., Li, Z., Chen, S., Zhang, C., Ma, H.: Image captioning with internal and external knowledge. In 29th ACM International Conference on Information and Knowledge Management (2020)
Google Scholar
Huang, L., Wang, W., Chen, J., Wei, X.Y.: Attention on attention for image captioning. In: In IEEE International Conference on Computer Vision, pp. 4634–4643 (2019)
Google Scholar
Huang, L., Wang, W., Xia, Y., Chen, J.: Adaptively aligned image captioning via adaptive attention time. In: In Advances in Neural Information Processing Systems, pp. 8942–8951 (2019)
Google Scholar
Huang, Y., Chen, J., Ouyang, W., Wan, W., Xue, Y.: Image captioning with end-to-end attribute detection and subsequent attributes prediction. IEEE Trans. Image Process. 29, 4013–4026 (2020)
Article Google Scholar
Ji, J., Xu, C., Zhang, X., Wang, B., Song, X.: Spatio-temporal memory attention for image captioning. IEEE Trans. Image Process. 29, 7615–7628 (2020)
Article Google Scholar
Jiang, W., Ma, L., Jiang, Y.G., Liu, W., Zhang, T.: Recurrent fusion network for image captioning. In: In European Conference on Computer Vision, pp. 499–515 (2018)
Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)
Google Scholar
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017)
Article MathSciNet Google Scholar
Liu, D., Zha, Z.J., Zhang, H., Zhang, Y., Wu, F.: Context-aware visual policy network for sequence-level image captioning. In: In 26th ACM international conference on Multimedia, pp. 1416–1424 (2018)
Google Scholar
Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 6, p. 2 (2017)
Google Scholar
Lyu, N.H.M.F.F.: TSFNET: triple-steam image captioning. IEEE Trans. Multimedia 25, 1–14 (2022)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Tan, H., Zhang, X., Lan, L., Huang, X., Luo, Z.: Nonnegative constrained graph based canonical correlation analysis for multi-view feature learning. Neural Processing Letters, pp. 1–26 (2018)
Google Scholar
Vedantam, R., Zitnick, C.L., Parikh, D.: Cider: consensus-based image description evaluation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4566–4575 (2015)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2016)
Article Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
Google Scholar
Wang, Y., Xu, J., Sun, Y.: A visual persistence model for image captioning. Neurocomputing 468, 48–59 (2022)
Article Google Scholar
Wu, Q., Shen, C., Wang, P., Dick, A., Hengel, A.V.: Image captioning and visual question answering based on attributes and external knowledge. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1367–1381 (2018)
Article Google Scholar
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: Computer Science, pp. 2048–2057 (2015)
Google Scholar
Yang, X., Tang, K., Zhang, H., Cai, J.: Auto-encoding scene graphs for image captioning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10685–10694 (2019)
Google Scholar
Yao, T., Pan, Y., Li, Y., Mei, T.: Exploring visual relationship for image captioning. In: European Conference on Computer Vision, pp. 684–699 (2018)
Google Scholar
Yao, T., Pan, Y., Li, Y., Mei, T.: Hierarchy parsing for image captioning. In: IEEE/CVF International Conference on Computer Vision, pp. 2621–2629 (2019)
Google Scholar
Yao, T., Pan, Y., Li, Y., Qiu, Z., Mei, T.: Boosting image captioning with attributes. In: IEEE International Conference on Computer Vision, pp. 22–29 (2017)
Google Scholar
You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4651–4659 (2016)
Google Scholar
Zhang, J., Mei, K., Zheng, Y., Fan, J.: Integrating part of speech guidance for image captioning. IEEE Trans. Multimedia 23, 92–104 (2020)
Article Google Scholar
Zhang, S., Ma, X., Duh, K., Durme, B.V.: Amr parsing as sequence-to-graph transduction. In: Annual Meeting of the Association for Computational Linguistics (2019)
Google Scholar
Zhou, Y., Sun, Y., Honavar, V.G.: Improving image captioning by leveraging knowledge graphs. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 283–293 (2019)
Google Scholar
Zhou, Y., Wang, M., Liu, D., Hu, Z., Zhang, H.: More grounded image captioning by distilling image-text matching model. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4777–4786 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

China Academy of Engineering Physics, Mianyang, 621000, China
Feng Chen
National University of Defense Technology, Changsha, 410073, China
Feng Chen, Xinyi Li, Jintao Tang, Shasha Li & Ting Wang

Authors

Feng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xinyi Li
View author publications
You can also search for this author in PubMed Google Scholar
Jintao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Shasha Li
View author publications
You can also search for this author in PubMed Google Scholar
Ting Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ting Wang .

Editor information

Editors and Affiliations

Peng Cheng Laboratory, Shenzhen, China
Xiangyu Song
China University of Geosciences, Wuhan, China
Ruyi Feng
China University of Geosciences, Wuhan, China
Yunliang Chen
Deakin University, Burwood, VIC, Australia
Jianxin Li
University of Exeter, Exeter, UK
Geyong Min

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, F., Li, X., Tang, J., Li, S., Wang, T. (2024). Benefit from AMR: Image Captioning with Explicit Relations and Endogenous Knowledge. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14332. Springer, Singapore. https://doi.org/10.1007/978-981-97-2390-4_25

Download citation

DOI: https://doi.org/10.1007/978-981-97-2390-4_25
Published: 28 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2389-8
Online ISBN: 978-981-97-2390-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Benefit from AMR: Image Captioning with Explicit Relations and Endogenous Knowledge