skip to main content
10.1145/3490035.3490291acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvgipConference Proceedingsconference-collections
research-article

Chart classification: an empirical comparative study of different learning models

Published: 19 December 2021 Publication History

Abstract

Charts are powerful tools for visualizing and comparing data. Representation of information through charts grows with time due to its easy and aesthetically attractive structure. With the increase in the number of documents with various chart types, chart classification has become an important task for downstream applications such as chart data recovery, chart replenishment, etc. Though there have been various studies reported in the literature on chart classification using different classification methods, three of the important concerns are small dataset size, a small number of chart types, and inconsistencies in the performance reported in different studies. Motivated by the above concerns, this paper curates a large dataset of real chart images (110k samples) with a large number of chart types (24 charts types) and evaluates 21 different machine learning models. To the best of our knowledge, this is the largest (in sample size and chart types) real chart dataset reported in the literature to date. We further study - (i) the effect of dataset size on the classification model, (ii) the nature of chart noises and their influences on classification performance, and (iii) confusing chart pairs leading to misclassification.

References

[1]
Omaima Al-Allaf. 2010. Improving the Performance of Backpropagation Neural Network Algorithm for Image Compression/Decompression System. Journal of Computer Science 6 (01 2010).
[2]
J. Amara, Pawandeep Kaur, Michael Owonibi, and B. Bouaziz. 2017. Convolutional neural network based chart image classification. 25th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision 2017.
[3]
Tiago Araújo, Paulo Chagas, João Alves, Carlos Santos, Beatriz Sousa Santos, and Bianchi Serique Meiguins. 2020. A Real-World Approach on the Problem of Chart Recognition Using Classification, Detection and Perspective Correction. Sensors 20, 16 (2020).
[4]
F. Bajić, J. Job, and K. Nenadić. 2019. Chart Classification Using Simplified VGG Model. In IWSSI. 229--233.
[5]
Abhijit Balaji, Thuvaarakkesh Ramanathan, and Venkateshwarlu Sonathi. 2018. Chart-Text: A Fully Automated Chart Image Descriptor. Computer Vision and Pattern Recognition (12 2018).
[6]
Dr. Emrah Caylak. 2010. The Studies about Phonological Deficit Theory in Children with Developmental Dyslexia: Review. 1, 1 (Jun. 2010), 1--12.
[7]
P. Chagas, R. Akiyama, A. Meiguins, C. Santos, F. Saraiva, B. Meiguins, and J. Morais. 2018. Evaluation of Convolutional Neural Network Architectures for Chart Image Classification. In IJCNN. 1--8.
[8]
P. Chagas, A. Freitas, R. Daisuke, B. Miranda, T. D. O. D. Araújo, C. Santos, B. Meiguins, and J. M. D. Morais. 2017. Architecture Proposal for Data Extraction of Chart Images Using Convolutional Neural Network. In 2017 IV. 318--323.
[9]
François Chollet. 2017. Xception: Deep Learning with Depthwise Separable Convolutions. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1800--1807.
[10]
Christopher Clark and Santosh Divvala. 2016. PDFFigures 2.0: Mining Figures from Research Papers. (2016).
[11]
Wenjing Dai, Meng Wang, Zhibin Niu, and Jiawan Zhang. 2018. Chart Decoder: Generating Textual and Numeric Information from Chart Images Automatically. Journal of Visual Languages Computing 48 (08 2018).
[12]
N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In CVPR'05, Vol. 1. 886--893 vol. 1.
[13]
K. Davila, B. U. Kota, S. Setlur, V. Govindaraju, C. Tensmeyer, S. Shekhar, and R. Chaudhry. 2019. ICDAR 2019 Competition on Harvesting Raw Tables from Infographics (CHART-Infographics). In ICDAR. 1594--1599.
[14]
K. Davila, S. Setlur, D. Doermann, U. K. Bhargava, and V. Govindaraju. 2020. Chart Mining: A Survey of Methods for Automated Chart Analysis. IEEE TPAMI (2020), 1--1.
[15]
Krešimir Nenadić Filip Bajić, Josip Job. 2020. Data Visualization Classification Using Simple Convolutional Neural Network Model. International Journal of Electrical and Computer Engineering Systems 11 (2020), 43--51.
[16]
R. P. Futrelle, I. A. Kakadiaris, J. Alexander, C. M. Carriero, N. Nikolakis, and J. M. Futrelle. 1992. Understanding diagrams in technical documents. Computer 25, 7 (1992), 75--78.
[17]
Robert P. Futrelle, Mingyan Shao, Chris Cieslik, and Andrea Elaina Grimes. 2003. Extraction, Layout Analysis and Classification of Diagrams in PDF Documents. In In ICDAR (ICDAR '03). IEEE Computer Society, USA, 1007.
[18]
J. Gao, Y. Zhou, and K. E. Barner. 2012. View: Visual Information Extraction Widget for improving chart images accessibility. In 2012 19th IEEE International Conference on Image Processing. 2865--2868.
[19]
Theodoros Giannakopoulos, Yannis Foufoulas, Eleftherios Stamatogiannakis, Harry Dimitropoulos, Natalia Manola, and Yannis Ioannidis. 2015. Visual-Based Classification of Figures from Scientific Literature. 1059--1060.
[20]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
[21]
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2018. Densely Connected Convolutional Networks.
[22]
Weihua Huang and Chew Lim Tan. 2007. A System for Understanding Imaged Infographics and Its Applications. In Proceedings of the 2007 ACM Symposium on Document Engineering (Winnipeg, Manitoba, Canada) (DocEng '07). Association for Computing Machinery, New York, NY, USA, 9--18.
[23]
W. Huang, S. Zong, and C. L. Tan. 2007. Chart Image Classification Using Multiple-Instance Learning. In 2007 IEEE WACV '07. 27--27.
[24]
K. V. Jobin, A. Mondal, and C. V. Jawahar. 2019. DocFigure: A Dataset for Scientific Document Figure Classification. In ICDARW, Vol. 1. 74--79.
[25]
Daekyoung Jung, Wonjae Kim, Hyunjoo Song, Jeong-in Hwang, Bongshin Lee, Bohyoung Kim, and Jinwook Seo. 2017. ChartSense: Interactive Data Extraction from Chart Images. ACM, New York, NY, USA, 6706--6717.
[26]
Mohd Safirin Karis, Nur Rafiqah Abdul Razif, Nursabillilah Mohd Ali, M. Asyraf Rosli, Mohd Shahrieel Mohd Aras, and Mariam Md Ghazaly. 2016. Local Binary Pattern (LBP) with application to variant object detection: A survey and method. In 12th CSPA. 221--226.
[27]
V. Karthikeyani and S. Nagarajan. 2011. Scientific Chart image property identification by connected component labeling in PDF files. In 2011 3rd International Conference on Electronics Computer Technology, Vol. 4. 209--212.
[28]
V. Karthikeyani and S. Nagarajan. 2012. Machine Learning Classification Algorithms to Recognize Chart Types in Portable Document Format (PDF) Files. International Journal of Computer Applications 39 (02 2012), 1--5.
[29]
Alex Krizhevsky, I Sutskever, and G Hinton. 2012. Imagenet classification with deep convolutional neural networks. Conference on Neural Information Processing Systems(NIPS), 1097--1105.
[30]
Z. Miranda D. Mukusheva R. Chang L. Battle, P. Duan and M. Stonebraker. 2018. Beagle: Automated extraction and interpretation of visualizations from the web. Proceedings of the 2018 CHI, ACM 48 (2018), 594.
[31]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.
[32]
Ales Mishchenko and Natalia Vassilieva. 2011. Model-Based Recognition and Extraction of Information from Chart Images. In Journal of Multimedia Processing and Technologies, Vol. 2. 76--89.
[33]
Hao Su Jonathan Krause Sanjeev Satheesh Sean Ma Zhiheng Huang Andrej Karpathy Aditya Khosla Michael Bernstein Alexander C. Berg Li Fei-Fei Olga Russakovsky, Jia Deng. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115 (2015), 211--252.
[34]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[35]
Jorge Poco and Jeffrey Heer. 2017. Reverse-Engineering Visualizations: Recovering Visual Encodings from Chart Images. Computer Graphics Forum 36 (2017), 353--363.
[36]
V. S. N. Prasad, B. Siddiquie, J. Golbeck, and L. S. Davis. 2007. Classifying Computer Generated Charts. In 2007 International Workshop on Content-Based Multimedia Indexing. 85--92.
[37]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2019. MobileNetV2: Inverted Residuals and Linear Bottlenecks.
[38]
Manolis Savva, Nicholas Kong, Arti Chhajta, Li Fei-Fei, Maneesh Agrawala, and Jeffrey Heer. 2011. ReVision: Automated Classification, Analysis and Redesign of Chart Images (UIST '11). ACM, New York, NY, USA, 10 pages.
[39]
Mingyan Shao and Robert P. Futrelle. 2006. Recognition and Classification of Figures in PDF Documents. In Graphics Recognition. Ten Years Review and Future Perspectives, Wenyin Liu and Josep Lladós (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 231--242.
[40]
Noah Siegel, Zachary Horvitz, Roie Levin, Santosh Divvala, and Ali Farhadi. 2016. FigureSeer: Parsing Result-Figures in Research Papers, Vol. 9911. 664--680.
[41]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR (09 2014).
[42]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. AAAI (02 2016).
[43]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and ZB Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. CVPR.
[44]
Binbin Tang, Xiao Liu, Jie Lei, Mingli Song, Dapeng Tao, Shuifa Sun, and Fangmin Dong. 2015. DeepChart: Combining deep convolutional networks and deep belief networks in chart classification. Signal Processing 124 (10 2015).
[45]
Jennil Thiyam, Sanasam Ranbir Singh, and Prabin K. Bora. 2021. Challenges in Chart Image Classification: A Comparative Study of Different Deep Learning Methods. In Proceedings of the 21st ACM Symposium on Document Engineering (Limerick, Ireland) (DocEng '21). Association for Computing Machinery, New York, NY, USA, Article 29, 4 pages.
[46]
P. NBless X. Liu, D. Klabjan. 2019. Data extraction from charts via single deep neural network. In arXiv preprint arXiv:1906.11906.
[47]
Yan Ping Zhou and Chew Lim Tan. 2000. Hough technique for bar charts detection and recognition in document images. In Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101), Vol. 2. 605--608 vol.2.
[48]
Watanabe Toyohide Yokokura, Naoko. 1998. Layout-based approach for extracting constructive elements of bar-charts. In Graphics Recognition Algorithms and Systems, Chhabra Atul K. Tombre, Karl (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 163--174.
[49]
YanPing Zhou and Chew Lim Tan. 2000. Hough-based model for recognizing bar charts in document images. In Document Recognition and Retrieval VIII, Paul B. Kantor, Daniel P. Lopresti, and Jiangying Zhou (Eds.), Vol. 4307. International Society for Optics and Photonics, SPIE, 333 -- 340.
[50]
Yanping Zhou and Chew Lim Tan. 2001. Learning-based scientific chart recognition. In 4th IAPR GREC2001. 482--492.

Cited By

View all
  • (2024)Swin-chartPattern Recognition Letters10.1016/j.patrec.2024.08.012185:C(203-209)Online publication date: 1-Sep-2024
  • (2023)Chart classification: a survey and benchmarking of different state-of-the-art methodsInternational Journal on Document Analysis and Recognition10.1007/s10032-023-00443-w27:1(19-44)Online publication date: 20-Jun-2023
  • (2022)Effect of attention and triplet loss on chart classification: a study on noisy charts and confusing chart pairsJournal of Intelligent Information Systems10.1007/s10844-022-00741-560:3(731-758)Online publication date: 6-Sep-2022

Index Terms

  1. Chart classification: an empirical comparative study of different learning models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICVGIP '21: Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing
    December 2021
    428 pages
    ISBN:9781450375962
    DOI:10.1145/3490035
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 December 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. chart image classification
    2. chart noise
    3. confusing chart pairs

    Qualifiers

    • Research-article

    Conference

    ICVGIP '21

    Acceptance Rates

    Overall Acceptance Rate 95 of 286 submissions, 33%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)46
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Swin-chartPattern Recognition Letters10.1016/j.patrec.2024.08.012185:C(203-209)Online publication date: 1-Sep-2024
    • (2023)Chart classification: a survey and benchmarking of different state-of-the-art methodsInternational Journal on Document Analysis and Recognition10.1007/s10032-023-00443-w27:1(19-44)Online publication date: 20-Jun-2023
    • (2022)Effect of attention and triplet loss on chart classification: a study on noisy charts and confusing chart pairsJournal of Intelligent Information Systems10.1007/s10844-022-00741-560:3(731-758)Online publication date: 6-Sep-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media