Charting the Uncharted: Building and Analyzing a Multifaceted Chart Question Answering Dataset for Complex Logical Reasoning Process

Wu, Anran; Yang, Shuwen; Xia, Yujia; Wu, Xingjiao; Ma, Tianlong; He, Liang

doi:10.1007/978-981-97-8620-6_2

Anran Wu ORCID: orcid.org/0000-0003-2843-898X¹⁵,
Shuwen Yang¹⁵,
Yujia Xia¹⁵,
Xingjiao Wu ORCID: orcid.org/0000-0001-9146-051X¹⁶,
Tianlong Ma^15,16 &
…
Liang He¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15035))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

154 Accesses

Abstract

Charts, as a vital part of visualization language, are omnipresent in real-world. Understanding charts is crucial for unveiling implicit data insights. The evolution of large-scale models has marked significant milestones in chart comprehension. However, comprehending multiple charts jointly remains challenging due to the complexities of multi-chart reasoning and the intricate dataset construction involving multiple charts. In this study, we introduce DGE, a sophisticated logic-based multi-chart question-answering dataset generation engine that, with only simple data input, generates diverse joint charts and questions with complex logic. It employs logical templates to guide question generation, ensuring excellent scalability. Leveraging the DGE engine, we propose MCQA, the inaugural large-scale dataset for joint reasoning question-answering involving multiple charts, which includes 22,860 chart pairs and 100,331 complex questions, each annotated with an inference process. Finally, we evaluate several baselines on the MCQA dataset, establishing a research foundation for the chart question answering community. The MCQA dataset is available at github (https://github.com/ICALK-CVU/MCQA).

Anran Wu and Shuwen Yang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

RealCQA: Scientific Chart Question Answering as a Test-Bed for First-Order Logic

Answering Multiple-Choice Questions in Geographical Gaokao with a Concept Graph

A study of approaches to answering complex questions over knowledge bases

Article 20 August 2022

References

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv:2303.08774 (2023)
Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P., Lin, J., Zhou, C., Zhou, J.: Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv:2308.12966 (2023)
Chaudhry, R., Shekhar, S., Gupta, U., Maneriker, P., Bansal, P., Joshi, A.: Leaf-qa: Locate, encode & attend for figure question answering. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 3512–3521 (2020)
Google Scholar
Davila, K., Setlur, S., Doermann, D., Kota, B.U., Govindaraju, V.: Chart mining: a survey of methods for automated chart analysis. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3799–3819 (2020)
Article Google Scholar
Dong, X., Zhang, P., Zang, Y., Cao, Y., Wang, B., Ouyang, L., Wei, X., Zhang, S., Duan, H., Cao, M., et al.: Internlm-xcomposer2: mastering free-form text-image composition and comprehension in vision-language large model. arXiv:2401.16420 (2024)
Hu, J., Yao, Y., Wang, C., Wang, S., Pan, Y., Chen, Q., Yu, T., Wu, H., Zhao, Y., Zhang, H., et al.: Large multilingual models pivot zero-shot multimodal learning across languages. arXiv:2308.12038 (2023)
Kafle, K., Price, B., Cohen, S., Kanan, C.: Dvqa: Understanding data visualizations via question answering. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5648–5656 (2018)
Google Scholar
Kafle, K., Shrestha, R., Cohen, S., Price, B., Kanan, C.: Answering questions about data visualizations using efficient bimodal fusion. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1498–1507 (2020)
Google Scholar
Kahou, S.E., Michalski, V., Atkinson, A., Kádár, Á., Trischler, A., Bengio, Y.: Figureqa: an annotated figure dataset for visual reasoning (2017)
Google Scholar
Kim, D.H., Hoque, E., Agrawala, M.: Answering questions about charts and generating visual explanations. In: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI), pp. 1–13 (2020)
Google Scholar
Kim, G., Hong, T., Yim, M., Nam, J., Park, J., Yim, J., Hwang, W., Yun, S., Han, D., Park, S.: Ocr-free document understanding transformer. In: European Conference on Computer Vision (ECCV), pp. 498–517. Springer (2022)
Google Scholar
Lee, K., Joshi, M., Turc, I.R., Hu, H., Liu, F., Eisenschlos, J.M., Khandelwal, U., Shaw, P., Chang, M.W., Toutanova, K.: Pix2struct: Screenshot parsing as pretraining for visual language understanding. In: Proceedings of International Conference on Machine Learning (ICML), pp. 18893–18912. PMLR (2023)
Google Scholar
Levy, M., Ben-Ari, R., Lischinski, D.: Classification-regression for chart comprehension. In: European Conference on Computer Vision (ECCV), pp. 469–484. Springer (2022)
Google Scholar
Li, D., Mei, H., Shen, Y., Su, S., Zhang, W., Wang, J., Zu, M., Chen, W.: Echarts: a declarative framework for rapid construction of web-based visualization. Vis. Inf. 2(2), 136–146 (2018)
MATH Google Scholar
Liu, F., Piccinno, F., Krichene, S., Pang, C., Lee, K., Joshi, M., Altun, Y., Collier, N., Eisenschlos, J.M.: Matcha: enhancing visual language pretraining with math reasoning and chart derendering (2022)
Google Scholar
Liu, H., Li, C., Li, Y., Lee, Y.J.: Improved baselines with visual instruction tuning. arXiv:2310.03744 (2023)
Ma, W., Zhang, H., Yan, S., Yao, G., Huang, Y., Li, H., Wu, Y., Jin, L.: Towards an efficient framework for data extraction from chart images. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 583–597. Springer (2021)
Google Scholar
Masry, A., Long, D., Tan, J.Q., Joty, S., Hoque, E.: Chartqa: a benchmark for question answering about charts with visual and logical reasoning. In: ACL Findings, pp. 2263–2279 (2022)
Google Scholar
Methani, N., Ganguly, P., Khapra, M.M., Kumar, P.: Plotqa: Reasoning over scientific plots. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1527–1536 (2020)
Google Scholar
Reddy, R., Ramesh, R., Deshpande, A., Khapra, M.M.: Figurenet: a deep learning model for question-answering on scientific plots. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)
Google Scholar
Santoro, A., Raposo, D., Barrett, D.G., Malinowski, M., Pascanu, R., Battaglia, P., Lillicrap, T.: A simple neural network module for relational reasoning. In: Advances in Neural Information Processing Systems (NeurIPS), 30 (2017)
Google Scholar
Singh, H., Shekhar, S.: Stl-cqa: structure-based transformers with localization and encoding for chart question answering. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3275–3284 (2020)
Google Scholar
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al.: Chain-of-thought prompting elicits reasoning in large language models, vol. 35, pp. 24824–24837 (2022)
Google Scholar
Xia, R., Zhang, B., Ye, H., Yan, X., Liu, Q., Zhou, H., Chen, Z., Dou, M., Shi, B., Yan, J., et al.: Chartx & chartvlm: a versatile benchmark and foundation model for complicated chart reasoning. arXiv:2402.12185 (2024)
Ye, Q., Xu, H., Ye, J., Yan, M., Liu, H., Qian, Q., Zhang, J., Huang, F., Zhou, J.: mplug-owl2: revolutionizing multi-modal large language model with modality collaboration. arXiv:2311.04257 (2023)
Zou, J., Wu, G., Xue, T., Wu, Q.: An affinity-driven relation network for figure question answering. In: IEEE International Conference on Multimedia & Expo (ICME), pp. 1–6 (2020)
Google Scholar

Download references

Acknowledgement

This research is funded by the National Key Research and Development Program of China (No. 2021ZD0114002), and the computation is performed in ECNU Multifunctional Platform for Innovation (001).

Author information

Authors and Affiliations

East China Normal University, Shanghai, 200062, China
Anran Wu, Shuwen Yang, Yujia Xia, Tianlong Ma & Liang He
Fudan University, Shanghai, 200433, China
Xingjiao Wu & Tianlong Ma

Authors

Anran Wu
View author publications
You can also search for this author in PubMed Google Scholar
Shuwen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yujia Xia
View author publications
You can also search for this author in PubMed Google Scholar
Xingjiao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Tianlong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Liang He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anran Wu .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhouchen Lin
Nankai University, Tianjin, China
Ming-Ming Cheng
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Ran He
Xinjiang University, Ürümqi, Xinjiang, China
Kurban Ubul
Xinjiang University, Ürümqi, China
Wushouer Silamu
Peking University, Beijing, China
Hongbin Zha
Tsinghua University, Beijing, China
Jie Zhou
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, A., Yang, S., Xia, Y., Wu, X., Ma, T., He, L. (2025). Charting the Uncharted: Building and Analyzing a Multifaceted Chart Question Answering Dataset for Complex Logical Reasoning Process. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15035. Springer, Singapore. https://doi.org/10.1007/978-981-97-8620-6_2

Download citation

DOI: https://doi.org/10.1007/978-981-97-8620-6_2
Published: 20 October 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8619-0
Online ISBN: 978-981-97-8620-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Charting the Uncharted: Building and Analyzing a Multifaceted Chart Question Answering Dataset for Complex Logical Reasoning Process

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

RealCQA: Scientific Chart Question Answering as a Test-Bed for First-Order Logic

Answering Multiple-Choice Questions in Geographical Gaokao with a Concept Graph

A study of approaches to answering complex questions over knowledge bases

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Charting the Uncharted: Building and Analyzing a Multifaceted Chart Question Answering Dataset for Complex Logical Reasoning Process

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

RealCQA: Scientific Chart Question Answering as a Test-Bed for First-Order Logic

Answering Multiple-Choice Questions in Geographical Gaokao with a Concept Graph

A study of approaches to answering complex questions over knowledge bases

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation