Diagram Question Answering with Joint Training and Bottom-Up and Top-Down Attention

Zhang, Ke; Li, Xiao; Cheng, Gong

doi:10.1007/978-981-19-8300-9_8

Ke Zhang¹⁰,
Xiao Li¹⁰ &
Gong Cheng¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1711))

Included in the following conference series:

China Conference on Knowledge Graph and Semantic Computing

683 Accesses

Abstract

Diagram question answering is a challenging multi-modal machine learning task that focuses on answering questions according to given diagrams on specific fields. Compared to natural imaged, these diagrams have more abstract expressions and complex logical relations, which makes diagram question answering more difficult. In this paper, we propose a new approach for diagram question answering task. We add bottom-up and top-down attention to identify regions of interest to questions and use a same model to jointly train multiple choice questions and true false questions. Our approach on test dataset of official CCKS2022 textbook diagram question answering session achieves the accuracy of 58.09%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Diagram Perception Networks for Textbook Question Answering via Joint Optimization

Article 30 November 2023

Textbook Question Answering with Multi-type Question Learning and Contextualized Diagram Representation

MoQA – A Multi-modal Question Answering Architecture

References

Lin, TY., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)
Google Scholar
Antol, S., Agrawal, A., Lu, J., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–2433 (2015)
Google Scholar
Wang, S., Zhang, L., Luo, X., et al.: Computer science diagram understanding with topology parsing. ACM Trans. Knowl. Discov. Data (TKDD) 16(6), 1–20 (2022)
Google Scholar
Kembhavi, A., Salvato, M., Kolve, E., Seo, M., Hajishirzi, H., Farhadi, A.: A diagram is worth a dozen images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) ECCV 2016. LNCS, vol. 9908, pp. 235–251. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_15
Kembhavi, A., Seo, M., Schwenk, D., et al.: Are you smarter than a sixth grader? Textbook question answering for multimodal machine comprehension. In: Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, pp. 4999–5007 (2017)
Google Scholar
Gómez-Pérez, J.M., Ortega, R.: ISAAQ-mastering textbook questions with pre-trained transformers and bottom-up and top-down attention. EMNLP (1) 2020
Google Scholar

Download references

Acknowledgements

This work was supported in part by the NSFC (62072224) and in part by the Beijing Academy of Artificial Intelligence (BAAI).

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Ke Zhang, Xiao Li & Gong Cheng

Authors

Ke Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Gong Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke Zhang .

Editor information

Editors and Affiliations

Zhejiang University, Hangzhou, China
Ningyu Zhang
Southeast University, Nanjing, China
Meng Wang
Southeast University, Nanjing, China
Tianxing Wu
Nanjing University, Nanjing, China
Wei Hu
National University of Singapore, Singapore, Singapore
Shumin Deng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, K., Li, X., Cheng, G. (2022). Diagram Question Answering with Joint Training and Bottom-Up and Top-Down Attention. In: Zhang, N., Wang, M., Wu, T., Hu, W., Deng, S. (eds) CCKS 2022 - Evaluation Track. CCKS 2022. Communications in Computer and Information Science, vol 1711. Springer, Singapore. https://doi.org/10.1007/978-981-19-8300-9_8

Download citation

DOI: https://doi.org/10.1007/978-981-19-8300-9_8
Published: 02 December 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8299-6
Online ISBN: 978-981-19-8300-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics