Abstract
Diagram question answering is a challenging multi-modal machine learning task that focuses on answering questions according to given diagrams on specific fields. Compared to natural imaged, these diagrams have more abstract expressions and complex logical relations, which makes diagram question answering more difficult. In this paper, we propose a new approach for diagram question answering task. We add bottom-up and top-down attention to identify regions of interest to questions and use a same model to jointly train multiple choice questions and true false questions. Our approach on test dataset of official CCKS2022 textbook diagram question answering session achieves the accuracy of 58.09%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lin, TY., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)
Antol, S., Agrawal, A., Lu, J., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–2433 (2015)
Wang, S., Zhang, L., Luo, X., et al.: Computer science diagram understanding with topology parsing. ACM Trans. Knowl. Discov. Data (TKDD) 16(6), 1–20 (2022)
Kembhavi, A., Salvato, M., Kolve, E., Seo, M., Hajishirzi, H., Farhadi, A.: A diagram is worth a dozen images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) ECCV 2016. LNCS, vol. 9908, pp. 235–251. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_15
Kembhavi, A., Seo, M., Schwenk, D., et al.: Are you smarter than a sixth grader? Textbook question answering for multimodal machine comprehension. In: Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, pp. 4999–5007 (2017)
Gómez-Pérez, J.M., Ortega, R.: ISAAQ-mastering textbook questions with pre-trained transformers and bottom-up and top-down attention. EMNLP (1) 2020
Acknowledgements
This work was supported in part by the NSFC (62072224) and in part by the Beijing Academy of Artificial Intelligence (BAAI).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, K., Li, X., Cheng, G. (2022). Diagram Question Answering with Joint Training and Bottom-Up and Top-Down Attention. In: Zhang, N., Wang, M., Wu, T., Hu, W., Deng, S. (eds) CCKS 2022 - Evaluation Track. CCKS 2022. Communications in Computer and Information Science, vol 1711. Springer, Singapore. https://doi.org/10.1007/978-981-19-8300-9_8
Download citation
DOI: https://doi.org/10.1007/978-981-19-8300-9_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8299-6
Online ISBN: 978-981-19-8300-9
eBook Packages: Computer ScienceComputer Science (R0)