research-article

Chart classification: an empirical comparative study of different learning models

Authors:

Sanasam Ranbir Singh,

Prabin K. BoraAuthors Info & Claims

ICVGIP '21: Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing

Article No.: 32, Pages 1 - 9

https://doi.org/10.1145/3490035.3490291

Published: 19 December 2021 Publication History

Abstract

Charts are powerful tools for visualizing and comparing data. Representation of information through charts grows with time due to its easy and aesthetically attractive structure. With the increase in the number of documents with various chart types, chart classification has become an important task for downstream applications such as chart data recovery, chart replenishment, etc. Though there have been various studies reported in the literature on chart classification using different classification methods, three of the important concerns are small dataset size, a small number of chart types, and inconsistencies in the performance reported in different studies. Motivated by the above concerns, this paper curates a large dataset of real chart images (110k samples) with a large number of chart types (24 charts types) and evaluates 21 different machine learning models. To the best of our knowledge, this is the largest (in sample size and chart types) real chart dataset reported in the literature to date. We further study - (i) the effect of dataset size on the classification model, (ii) the nature of chart noises and their influences on classification performance, and (iii) confusing chart pairs leading to misclassification.

References

[1]

Omaima Al-Allaf. 2010. Improving the Performance of Backpropagation Neural Network Algorithm for Image Compression/Decompression System. Journal of Computer Science 6 (01 2010).

[2]

J. Amara, Pawandeep Kaur, Michael Owonibi, and B. Bouaziz. 2017. Convolutional neural network based chart image classification. 25th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision 2017.

[3]

Tiago Araújo, Paulo Chagas, João Alves, Carlos Santos, Beatriz Sousa Santos, and Bianchi Serique Meiguins. 2020. A Real-World Approach on the Problem of Chart Recognition Using Classification, Detection and Perspective Correction. Sensors 20, 16 (2020).

[4]

F. Bajić, J. Job, and K. Nenadić. 2019. Chart Classification Using Simplified VGG Model. In IWSSI. 229--233.

[5]

Abhijit Balaji, Thuvaarakkesh Ramanathan, and Venkateshwarlu Sonathi. 2018. Chart-Text: A Fully Automated Chart Image Descriptor. Computer Vision and Pattern Recognition (12 2018).

[6]

Dr. Emrah Caylak. 2010. The Studies about Phonological Deficit Theory in Children with Developmental Dyslexia: Review. 1, 1 (Jun. 2010), 1--12.

[7]

P. Chagas, R. Akiyama, A. Meiguins, C. Santos, F. Saraiva, B. Meiguins, and J. Morais. 2018. Evaluation of Convolutional Neural Network Architectures for Chart Image Classification. In IJCNN. 1--8.

[8]

P. Chagas, A. Freitas, R. Daisuke, B. Miranda, T. D. O. D. Araújo, C. Santos, B. Meiguins, and J. M. D. Morais. 2017. Architecture Proposal for Data Extraction of Chart Images Using Convolutional Neural Network. In 2017 IV. 318--323.

[9]

François Chollet. 2017. Xception: Deep Learning with Depthwise Separable Convolutions. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1800--1807.

[10]

Christopher Clark and Santosh Divvala. 2016. PDFFigures 2.0: Mining Figures from Research Papers. (2016).

[11]

Wenjing Dai, Meng Wang, Zhibin Niu, and Jiawan Zhang. 2018. Chart Decoder: Generating Textual and Numeric Information from Chart Images Automatically. Journal of Visual Languages Computing 48 (08 2018).

[12]

N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In CVPR'05, Vol. 1. 886--893 vol. 1.

Digital Library

[13]

K. Davila, B. U. Kota, S. Setlur, V. Govindaraju, C. Tensmeyer, S. Shekhar, and R. Chaudhry. 2019. ICDAR 2019 Competition on Harvesting Raw Tables from Infographics (CHART-Infographics). In ICDAR. 1594--1599.

[14]

K. Davila, S. Setlur, D. Doermann, U. K. Bhargava, and V. Govindaraju. 2020. Chart Mining: A Survey of Methods for Automated Chart Analysis. IEEE TPAMI (2020), 1--1.

[15]

Krešimir Nenadić Filip Bajić, Josip Job. 2020. Data Visualization Classification Using Simple Convolutional Neural Network Model. International Journal of Electrical and Computer Engineering Systems 11 (2020), 43--51.

[16]

R. P. Futrelle, I. A. Kakadiaris, J. Alexander, C. M. Carriero, N. Nikolakis, and J. M. Futrelle. 1992. Understanding diagrams in technical documents. Computer 25, 7 (1992), 75--78.

Digital Library

[17]

Robert P. Futrelle, Mingyan Shao, Chris Cieslik, and Andrea Elaina Grimes. 2003. Extraction, Layout Analysis and Classification of Diagrams in PDF Documents. In In ICDAR (ICDAR '03). IEEE Computer Society, USA, 1007.

Digital Library

[18]

J. Gao, Y. Zhou, and K. E. Barner. 2012. View: Visual Information Extraction Widget for improving chart images accessibility. In 2012 19th IEEE International Conference on Image Processing. 2865--2868.

[19]

Theodoros Giannakopoulos, Yannis Foufoulas, Eleftherios Stamatogiannakis, Harry Dimitropoulos, Natalia Manola, and Yannis Ioannidis. 2015. Visual-Based Classification of Figures from Scientific Literature. 1059--1060.

Digital Library

[20]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.

[21]

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2018. Densely Connected Convolutional Networks.

[22]

Weihua Huang and Chew Lim Tan. 2007. A System for Understanding Imaged Infographics and Its Applications. In Proceedings of the 2007 ACM Symposium on Document Engineering (Winnipeg, Manitoba, Canada) (DocEng '07). Association for Computing Machinery, New York, NY, USA, 9--18.

Digital Library

[23]

W. Huang, S. Zong, and C. L. Tan. 2007. Chart Image Classification Using Multiple-Instance Learning. In 2007 IEEE WACV '07. 27--27.

Digital Library

[24]

K. V. Jobin, A. Mondal, and C. V. Jawahar. 2019. DocFigure: A Dataset for Scientific Document Figure Classification. In ICDARW, Vol. 1. 74--79.

[25]

Daekyoung Jung, Wonjae Kim, Hyunjoo Song, Jeong-in Hwang, Bongshin Lee, Bohyoung Kim, and Jinwook Seo. 2017. ChartSense: Interactive Data Extraction from Chart Images. ACM, New York, NY, USA, 6706--6717.

Digital Library

[26]

Mohd Safirin Karis, Nur Rafiqah Abdul Razif, Nursabillilah Mohd Ali, M. Asyraf Rosli, Mohd Shahrieel Mohd Aras, and Mariam Md Ghazaly. 2016. Local Binary Pattern (LBP) with application to variant object detection: A survey and method. In 12th CSPA. 221--226.

[27]

V. Karthikeyani and S. Nagarajan. 2011. Scientific Chart image property identification by connected component labeling in PDF files. In 2011 3rd International Conference on Electronics Computer Technology, Vol. 4. 209--212.

[28]

V. Karthikeyani and S. Nagarajan. 2012. Machine Learning Classification Algorithms to Recognize Chart Types in Portable Document Format (PDF) Files. International Journal of Computer Applications 39 (02 2012), 1--5.

[29]

Alex Krizhevsky, I Sutskever, and G Hinton. 2012. Imagenet classification with deep convolutional neural networks. Conference on Neural Information Processing Systems(NIPS), 1097--1105.

Digital Library

[30]

Z. Miranda D. Mukusheva R. Chang L. Battle, P. Duan and M. Stonebraker. 2018. Beagle: Automated extraction and interpretation of visualizations from the web. Proceedings of the 2018 CHI, ACM 48 (2018), 594.

[31]

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.

[32]

Ales Mishchenko and Natalia Vassilieva. 2011. Model-Based Recognition and Extraction of Information from Chart Images. In Journal of Multimedia Processing and Technologies, Vol. 2. 76--89.

[33]

Hao Su Jonathan Krause Sanjeev Satheesh Sean Ma Zhiheng Huang Andrej Karpathy Aditya Khosla Michael Bernstein Alexander C. Berg Li Fei-Fei Olga Russakovsky, Jia Deng. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115 (2015), 211--252.

Digital Library

[34]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.

Digital Library

[35]

Jorge Poco and Jeffrey Heer. 2017. Reverse-Engineering Visualizations: Recovering Visual Encodings from Chart Images. Computer Graphics Forum 36 (2017), 353--363.

Digital Library

[36]

V. S. N. Prasad, B. Siddiquie, J. Golbeck, and L. S. Davis. 2007. Classifying Computer Generated Charts. In 2007 International Workshop on Content-Based Multimedia Indexing. 85--92.

[37]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2019. MobileNetV2: Inverted Residuals and Linear Bottlenecks.

[38]

Manolis Savva, Nicholas Kong, Arti Chhajta, Li Fei-Fei, Maneesh Agrawala, and Jeffrey Heer. 2011. ReVision: Automated Classification, Analysis and Redesign of Chart Images (UIST '11). ACM, New York, NY, USA, 10 pages.

Digital Library

[39]

Mingyan Shao and Robert P. Futrelle. 2006. Recognition and Classification of Figures in PDF Documents. In Graphics Recognition. Ten Years Review and Future Perspectives, Wenyin Liu and Josep Lladós (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 231--242.

Digital Library

[40]

Noah Siegel, Zachary Horvitz, Roie Levin, Santosh Divvala, and Ali Farhadi. 2016. FigureSeer: Parsing Result-Figures in Research Papers, Vol. 9911. 664--680.

[41]

Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR (09 2014).

[42]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. AAAI (02 2016).

[43]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and ZB Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. CVPR.

[44]

Binbin Tang, Xiao Liu, Jie Lei, Mingli Song, Dapeng Tao, Shuifa Sun, and Fangmin Dong. 2015. DeepChart: Combining deep convolutional networks and deep belief networks in chart classification. Signal Processing 124 (10 2015).

Digital Library

[45]

Jennil Thiyam, Sanasam Ranbir Singh, and Prabin K. Bora. 2021. Challenges in Chart Image Classification: A Comparative Study of Different Deep Learning Methods. In Proceedings of the 21st ACM Symposium on Document Engineering (Limerick, Ireland) (DocEng '21). Association for Computing Machinery, New York, NY, USA, Article 29, 4 pages.

Digital Library

[46]

P. NBless X. Liu, D. Klabjan. 2019. Data extraction from charts via single deep neural network. In arXiv preprint arXiv:1906.11906.

[47]

Yan Ping Zhou and Chew Lim Tan. 2000. Hough technique for bar charts detection and recognition in document images. In Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101), Vol. 2. 605--608 vol.2.

[48]

Watanabe Toyohide Yokokura, Naoko. 1998. Layout-based approach for extracting constructive elements of bar-charts. In Graphics Recognition Algorithms and Systems, Chhabra Atul K. Tombre, Karl (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 163--174.

Digital Library

[49]

YanPing Zhou and Chew Lim Tan. 2000. Hough-based model for recognizing bar charts in document images. In Document Recognition and Retrieval VIII, Paul B. Kantor, Daniel P. Lopresti, and Jiangying Zhou (Eds.), Vol. 4307. International Society for Optics and Photonics, SPIE, 333 -- 340.

[50]

Yanping Zhou and Chew Lim Tan. 2001. Learning-based scientific chart recognition. In 4th IAPR GREC2001. 482--492.

Cited By

Dhote AJaved MDoermann D(2024)Swin-chartPattern Recognition Letters10.1016/j.patrec.2024.08.012185:C(203-209)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.patrec.2024.08.012
Thiyam JSingh SBora P(2023)Chart classification: a survey and benchmarking of different state-of-the-art methodsInternational Journal on Document Analysis and Recognition10.1007/s10032-023-00443-w27:1(19-44)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1007/s10032-023-00443-w
Thiyam JSingh SBora P(2022)Effect of attention and triplet loss on chart classification: a study on noisy charts and confusing chart pairsJournal of Intelligent Information Systems10.1007/s10844-022-00741-560:3(731-758)Online publication date: 6-Sep-2022
https://dl.acm.org/doi/10.1007/s10844-022-00741-5

Index Terms

Chart classification: an empirical comparative study of different learning models
1. Computing methodologies
  1. Computer graphics

Recommendations

Challenges in chart image classification: a comparative study of different deep learning methods
DocEng '21: Proceedings of the 21st ACM Symposium on Document Engineering

Charts are commonly used forms of visualizing scientific observations from research findings or commercial trends. They provide an abstraction of the underlying information in a more understandable way. Over time, different forms of charts are ...
Chart classification: a survey and benchmarking of different state-of-the-art methods
Abstract
With the increase in the number of documents with various types of charts available on the internet, automatic chart classification has become an essential task for various downstream applications such as chart data recovery, chart replenishment. ...
Classifying Chart Based on Structural Dissimilarities using Improved Regularized Loss Function
Abstract
Classification of charts is a major challenge because each chart class has variations due to the styles, appearances, structure, and noises caused due to changing data values. These variations differ across all chart types and sub-types. Hence, it ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICVGIP '21: Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing

December 2021

428 pages

ISBN:9781450375962

DOI:10.1145/3490035

General Chairs:
Rama Chellappa
Johns Hopkins University
,
Santanu Chaudhury
IIT Jodhpur
,
Program Chairs:
Chetan Arora
IIT Delhi
,
Parag Chaudhuri
IIT Bombay
,
Subhransu Maji
University of Massachusetts, Amherst

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 December 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICVGIP '21

ICVGIP '21: Indian Conference on Computer Vision, Graphics and Image Processing

December 19 - 22, 2021

Jodhpur, India

Acceptance Rates

Overall Acceptance Rate 95 of 286 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
222
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)2

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dhote AJaved MDoermann D(2024)Swin-chartPattern Recognition Letters10.1016/j.patrec.2024.08.012185:C(203-209)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.patrec.2024.08.012
Thiyam JSingh SBora P(2023)Chart classification: a survey and benchmarking of different state-of-the-art methodsInternational Journal on Document Analysis and Recognition10.1007/s10032-023-00443-w27:1(19-44)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1007/s10032-023-00443-w
Thiyam JSingh SBora P(2022)Effect of attention and triplet loss on chart classification: a study on noisy charts and confusing chart pairsJournal of Intelligent Information Systems10.1007/s10844-022-00741-560:3(731-758)Online publication date: 6-Sep-2022
https://dl.acm.org/doi/10.1007/s10844-022-00741-5

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten