Chart classification: an empirical comparative study of different learning models

Charts are powerful tools for visualizing and comparing data. Representation of information through charts grows with time due to its easy and aesthetically attractive structure. With the increase in the number of documents with various chart types, chart classification has become an important task for downstream applications such as chart data recovery, chart replenishment, etc. Though there have been various studies reported in the literature on chart classification using different classification methods, three of the important concerns are small dataset size, a small number of chart types, and inconsistencies in the performance reported in different studies. Motivated by the above concerns, this paper curates a large dataset of real chart images (110k samples) with a large number of chart types (24 charts types) and evaluates 21 different machine learning models. To the best of our knowledge, this is the largest (in sample size and chart types) real chart dataset reported in the literature to date. We further study - (i) the effect of dataset size on the classification model, (ii) the nature of chart noises and their influences on classification performance, and (iii) confusing chart pairs leading to misclassification.


  (2024)Swin-chartPattern Recognition Letters10.1016/j.patrec.2024.08.012185:C(203-209)Online publication date: 1-Sep-2024
  (2023)Chart classification: a survey and benchmarking of different state-of-the-art methodsInternational Journal on Document Analysis and Recognition10.1007/s10032-023-00443-w27:1(19-44)Online publication date: 20-Jun-2023
  (2022)Effect of attention and triplet loss on chart classification: a study on noisy charts and confusing chart pairsJournal of Intelligent Information Systems10.1007/s10844-022-00741-560:3(731-758)Online publication date: 6-Sep-2022

    1. chart image classification
    2. chart noise
    3. confusing chart pairs


    • (2024)Swin-chartPattern Recognition Letters10.1016/j.patrec.2024.08.012185:C(203-209)Online publication date: 1-Sep-2024
    • (2023)Chart classification: a survey and benchmarking of different state-of-the-art methodsInternational Journal on Document Analysis and Recognition10.1007/s10032-023-00443-w27:1(19-44)Online publication date: 20-Jun-2023
    • (2022)Effect of attention and triplet loss on chart classification: a study on noisy charts and confusing chart pairsJournal of Intelligent Information Systems10.1007/s10844-022-00741-560:3(731-758)Online publication date: 6-Sep-2022

