Abstract
Charts, as a vital part of visualization language, are omnipresent in real-world. Understanding charts is crucial for unveiling implicit data insights. The evolution of large-scale models has marked significant milestones in chart comprehension. However, comprehending multiple charts jointly remains challenging due to the complexities of multi-chart reasoning and the intricate dataset construction involving multiple charts. In this study, we introduce DGE, a sophisticated logic-based multi-chart question-answering dataset generation engine that, with only simple data input, generates diverse joint charts and questions with complex logic. It employs logical templates to guide question generation, ensuring excellent scalability. Leveraging the DGE engine, we propose MCQA, the inaugural large-scale dataset for joint reasoning question-answering involving multiple charts, which includes 22,860 chart pairs and 100,331 complex questions, each annotated with an inference process. Finally, we evaluate several baselines on the MCQA dataset, establishing a research foundation for the chart question answering community. The MCQA dataset is available at github (https://github.com/ICALK-CVU/MCQA).
Anran Wu and Shuwen Yang—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv:2303.08774 (2023)
Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P., Lin, J., Zhou, C., Zhou, J.: Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv:2308.12966 (2023)
Chaudhry, R., Shekhar, S., Gupta, U., Maneriker, P., Bansal, P., Joshi, A.: Leaf-qa: Locate, encode & attend for figure question answering. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 3512–3521 (2020)
Davila, K., Setlur, S., Doermann, D., Kota, B.U., Govindaraju, V.: Chart mining: a survey of methods for automated chart analysis. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3799–3819 (2020)
Dong, X., Zhang, P., Zang, Y., Cao, Y., Wang, B., Ouyang, L., Wei, X., Zhang, S., Duan, H., Cao, M., et al.: Internlm-xcomposer2: mastering free-form text-image composition and comprehension in vision-language large model. arXiv:2401.16420 (2024)
Hu, J., Yao, Y., Wang, C., Wang, S., Pan, Y., Chen, Q., Yu, T., Wu, H., Zhao, Y., Zhang, H., et al.: Large multilingual models pivot zero-shot multimodal learning across languages. arXiv:2308.12038 (2023)
Kafle, K., Price, B., Cohen, S., Kanan, C.: Dvqa: Understanding data visualizations via question answering. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5648–5656 (2018)
Kafle, K., Shrestha, R., Cohen, S., Price, B., Kanan, C.: Answering questions about data visualizations using efficient bimodal fusion. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1498–1507 (2020)
Kahou, S.E., Michalski, V., Atkinson, A., Kádár, Á., Trischler, A., Bengio, Y.: Figureqa: an annotated figure dataset for visual reasoning (2017)
Kim, D.H., Hoque, E., Agrawala, M.: Answering questions about charts and generating visual explanations. In: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI), pp. 1–13 (2020)
Kim, G., Hong, T., Yim, M., Nam, J., Park, J., Yim, J., Hwang, W., Yun, S., Han, D., Park, S.: Ocr-free document understanding transformer. In: European Conference on Computer Vision (ECCV), pp. 498–517. Springer (2022)
Lee, K., Joshi, M., Turc, I.R., Hu, H., Liu, F., Eisenschlos, J.M., Khandelwal, U., Shaw, P., Chang, M.W., Toutanova, K.: Pix2struct: Screenshot parsing as pretraining for visual language understanding. In: Proceedings of International Conference on Machine Learning (ICML), pp. 18893–18912. PMLR (2023)
Levy, M., Ben-Ari, R., Lischinski, D.: Classification-regression for chart comprehension. In: European Conference on Computer Vision (ECCV), pp. 469–484. Springer (2022)
Li, D., Mei, H., Shen, Y., Su, S., Zhang, W., Wang, J., Zu, M., Chen, W.: Echarts: a declarative framework for rapid construction of web-based visualization. Vis. Inf. 2(2), 136–146 (2018)
Liu, F., Piccinno, F., Krichene, S., Pang, C., Lee, K., Joshi, M., Altun, Y., Collier, N., Eisenschlos, J.M.: Matcha: enhancing visual language pretraining with math reasoning and chart derendering (2022)
Liu, H., Li, C., Li, Y., Lee, Y.J.: Improved baselines with visual instruction tuning. arXiv:2310.03744 (2023)
Ma, W., Zhang, H., Yan, S., Yao, G., Huang, Y., Li, H., Wu, Y., Jin, L.: Towards an efficient framework for data extraction from chart images. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 583–597. Springer (2021)
Masry, A., Long, D., Tan, J.Q., Joty, S., Hoque, E.: Chartqa: a benchmark for question answering about charts with visual and logical reasoning. In: ACL Findings, pp. 2263–2279 (2022)
Methani, N., Ganguly, P., Khapra, M.M., Kumar, P.: Plotqa: Reasoning over scientific plots. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1527–1536 (2020)
Reddy, R., Ramesh, R., Deshpande, A., Khapra, M.M.: Figurenet: a deep learning model for question-answering on scientific plots. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)
Santoro, A., Raposo, D., Barrett, D.G., Malinowski, M., Pascanu, R., Battaglia, P., Lillicrap, T.: A simple neural network module for relational reasoning. In: Advances in Neural Information Processing Systems (NeurIPS), 30 (2017)
Singh, H., Shekhar, S.: Stl-cqa: structure-based transformers with localization and encoding for chart question answering. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3275–3284 (2020)
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al.: Chain-of-thought prompting elicits reasoning in large language models, vol. 35, pp. 24824–24837 (2022)
Xia, R., Zhang, B., Ye, H., Yan, X., Liu, Q., Zhou, H., Chen, Z., Dou, M., Shi, B., Yan, J., et al.: Chartx & chartvlm: a versatile benchmark and foundation model for complicated chart reasoning. arXiv:2402.12185 (2024)
Ye, Q., Xu, H., Ye, J., Yan, M., Liu, H., Qian, Q., Zhang, J., Huang, F., Zhou, J.: mplug-owl2: revolutionizing multi-modal large language model with modality collaboration. arXiv:2311.04257 (2023)
Zou, J., Wu, G., Xue, T., Wu, Q.: An affinity-driven relation network for figure question answering. In: IEEE International Conference on Multimedia & Expo (ICME), pp. 1–6 (2020)
Acknowledgement
This research is funded by the National Key Research and Development Program of China (No. 2021ZD0114002), and the computation is performed in ECNU Multifunctional Platform for Innovation (001).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wu, A., Yang, S., Xia, Y., Wu, X., Ma, T., He, L. (2025). Charting the Uncharted: Building and Analyzing a Multifaceted Chart Question Answering Dataset for Complex Logical Reasoning Process. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15035. Springer, Singapore. https://doi.org/10.1007/978-981-97-8620-6_2
Download citation
DOI: https://doi.org/10.1007/978-981-97-8620-6_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8619-0
Online ISBN: 978-981-97-8620-6
eBook Packages: Computer ScienceComputer Science (R0)