research-article

ChartSense: Interactive Data Extraction from Chart Images

Authors:
Daekyoung Jung

Seoul National University, Seoul, Republic of Korea

Seoul National University, Seoul, Republic of Korea
View Profile

,
Wonjae Kim

Seoul National University, Seoul, Republic of Korea

Seoul National University, Seoul, Republic of Korea
View Profile

,
Hyunjoo Song

Seoul National University, Seoul, Republic of Korea

Seoul National University, Seoul, Republic of Korea
View Profile

,
Jeong-in Hwang

Seoul National University, Seoul, Republic of Korea

Seoul National University, Seoul, Republic of Korea
View Profile

,
Bongshin Lee

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Bohyoung Kim

Hankuk University of Foreign Studies, Yongin-si, Gyeonggi-do, Republic of Korea

Hankuk University of Foreign Studies, Yongin-si, Gyeonggi-do, Republic of Korea
View Profile

,
Jinwook Seo

Seoul National University, Seoul, Republic of Korea

Seoul National University, Seoul, Republic of Korea
View Profile

CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing SystemsMay 2017Pages 6706–6717https://doi.org/10.1145/3025453.3025957

Published:02 May 2017Publication History

CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems

Pages 6706–6717

ABSTRACT

Charts are commonly used to present data in digital documents such as web pages, research papers, or presentation slides. When the underlying data is not available, it is necessary to extract the data from a chart image to utilize the data for further analysis or improve the chart for more accurate perception. In this paper, we present ChartSense, an interactive chart data extraction system. ChartSense first determines the chart type of a given chart image using a deep learning based classifier, and then extracts underlying data from the chart image using semi-automatic, interactive extraction algorithms optimized for each chart type. To evaluate chart type classification accuracy, we compared ChartSense with ReVision, a system with the state-of-the-art chart type classifier. We found that ChartSense was more accurate than ReVision. In addition, to evaluate data extraction performance, we conducted a user study, comparing ChartSense with WebPlotDigitizer, one of the most effective chart data extraction tools among publicly accessible ones. Our results showed that ChartSense was better than WebPlotDigitizer in terms of task completion time, error rate, and subjective preference.

Supplemental Material

pn4070-file3.mp4

mp4

18.6 MB

Download

pn4070p.mp4

mp4

1.5 MB

Download

References

Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D³ data-driven documents. IEEE Trans. Vis. Comput. Graphics 17, 12: 2301--2309. Google ScholarDigital Library
Léon Bottou. 2012. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade, Grégoire Montavon, Geneviève, Orr, Klaus-Robert Müller (eds.). Springer Berlin Heidelberg, 421--436.Google Scholar
John Canny. 1986. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell PAMI-8, 6: 679--698. Google ScholarDigital Library
Zhe Chen, Michael Cafarella, and Eytan Adar. 2015. DiagramFlyer: A search engine for data-driven diagrams. In Proceedings of the 24th International Conference on World Wide Web Companion (WWW '15), 183--186. Google ScholarDigital Library
Sagnik Ray Choudhury and Clyde Lee Giles. 2015. An architecture for information extraction from figures in digital libraries. In Proceedings of the 24th International Conference on World Wide Web Companion (WWW '15), 667--672. Google ScholarDigital Library
Jinglun Gao, Yin Zhou, and Kenneth E. Barner. 2012. View: Visual information extraction widget for improving chart images accessibility. In Proceedings of the 19th IEEE International Conference on Image Processing (ICIP '12), 2865--2868Google Scholar
Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie G. Karahalios. 2015. DataTone: Managing ambiguity in natural language interfaces for data Visualization. In Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology (UIST '15), 489--500. Google ScholarDigital Library
Arnd Gross, Sibylle Schirm, and Markus Scholz. 2014. Ycasd--a tool for capturing and scaling data from graphical representations. BMC bioinformatics 15, 1: 219.Google Scholar
Jonathan Harper and Maneesh Agrawala. 2014. Deconstructing and restyling D3 visualizations. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (UIST '14), 253--262. Google ScholarDigital Library
Christopher G. Healey, Sarat Kocherlakota, Vivek Rao, Reshma Mehta, and Renee St. Amant. 2008. Visual perception and mixed-initiative interaction for assisted visualization design. IEEE Trans. Vis. Comput. Graphics 14, 2: 396--411. Google ScholarDigital Library
Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (CHI '99), 159--166. Google ScholarDigital Library
Weihua Huang, Chew Lim Tan, and Wee Kheng Leow. 2004. Model-based chart image recognition. In Graphics Recognition. Recent Advances and Perspectives, Josep Lladós and Young-Bin Kwon (eds.). Springer Berlin Heidelberg, 87--99.Google Scholar
Weihua Huang, Ruizhe Liu, and Chew Lim Tan. 2007. Extraction of vectorized graphical information from scientific chart images. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR '07), 521--525. Google ScholarDigital Library
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (MM '14), 675--678. Google ScholarDigital Library
Nicholas Kong and Maneesh Agrawala. 2012. Graphical overlays: Using layered elements to aid chart reading. IEEE Trans. Vis. Comput. Graphics 18, 12: 2631--2638. Google ScholarDigital Library
Nicholas Kong, Marti A. Hearst, and Maneesh Agrawala. 2014. Extracting references between text and charts via crowdsourcing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '14), 31--40. Google ScholarDigital Library
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th Neural Information Processing Systems (NIPS '12), 1106--1114. Google ScholarDigital Library
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11: 2278--2324.Google ScholarCross Ref
Yan Liu, Xiaoqing Lu, Yeyang Qin, Zhi Tang, and Jianbo Xu. 2013. Review of chart recognition in document images. In IS&T/SPIE Electronic Imaging, 865410--865410.Google Scholar
Jock Mackinlay. 1986. Automating the design of graphical presentations of relational information. ACM Trans. Graph. 5, 2: 110--141. Google ScholarDigital Library
Gonzalo Gabriel Méndez, Miguel A. Nacenta, and Sebastien Vandenheste. 2016. iVoLVER: Interactive visual language for visualization extraction and reconstruction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '16), 4073--4085 Google ScholarDigital Library
Aria Pezeshk and Richard L. Tutwiler. 2011. Automatic feature extraction and text recognition from scanned topographic maps. IEEE Trans. Geosci. Remote Sens 49, 12: 5047--5063.Google ScholarCross Ref
Ankit Rohatgi. 2015. WebPlotDigitizer, Version 3.8. Retrieved September 22, 2015 from http://arohatgi.info/WebPlotDigitizerGoogle Scholar
Manolis Savva, Nicholas Kong, Arti Chhajta, Li FeiFei, Maneesh Agrawala, and Jeffrey Heer. 2011. ReVision: Automated classification, analysis and redesign of chart images. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST '11), 393--402. Google ScholarDigital Library
Julia Schwarz, Scott Hudson, Jennifer Mankoff, and Andrew D. Wilson. 2010. A framework for robust and flexible handling of inputs with uncertainty. In Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology (UIST '10), 47--56. Google ScholarDigital Library
Mingyan Shao and Robert P. Futrelle. 2006. Recognition and classification of figures in PDF documents. In Graphics Recognition. Ten Years Review and Future Perspectives, Wenyin Liu and Josep Lladós (eds.). Springer Berlin Heidelberg, 231242. Google ScholarDigital Library
Noah Siegel, Zachary Horvitz, Roie Levin, Santosh Divvala, and Ali Farhadi. 2016. FigureSeer: Parsing result-figures in research papers. In Proceedings of the 14th European Conference on Computer Vision (ECCV '16), 664--680.Google ScholarCross Ref
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '15), 1--9.Google ScholarCross Ref
Karl Tombre, Salvatore Tabbone, Loïc Pélissier, Bart Lamiroy, and Philippe Dosch. 2002. Text/graphics separation revisited. In Document Analysis Systems V, Daniel Lopresti, Jianying Hu, and Ramanujan Kashi (eds.). Springer Berlin Heidelberg, 200--211. Google ScholarDigital Library
Bas Tummers. 2015. DataTheif III. Retrieved September 22, 2015 from http://www.datathief.org/Google Scholar
Colin Ware, 2012. Information Visualization: perception for design (3rd. ed.). Elsevier. Google ScholarDigital Library
Qixiang Ye and David Doermann. 2015. Text detection and recognition in imagery: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 37, 7: 1480--1500.Google ScholarDigital Library

Index Terms

ChartSense: Interactive Data Extraction from Chart Images
1. Human-centered computing

Recommendations

BarChartAnalyzer: Digitizing Images of Bar Charts
IMPROVE 2021: Proceedings of the International Conference on Image Processing and Vision Engineering

Charts or scientific plots are widely used visualizations for efficient knowledge dissemination from datasets. However, these charts are predominantly available in image format. There are various scenarios where these images are interpreted in the ...
Read More
BarChartAnalyzer: Data Extraction and Summarization of Bar Charts from Images
Abstract
Charts or scientific plots are widely used visualizations for efficient knowledge dissemination from datasets. However, these charts are predominantly available in image format. There are various scenarios where these images are interpreted in the ...
Read More
A Survey and Approach to Chart Classification
Document Analysis and Recognition – ICDAR 2023 Workshops
Abstract
Charts represent an essential source of visual information in documents and facilitate a deep understanding and interpretation of information typically conveyed numerically. In the scientific literature, there are many charts, each with its ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems
May 2017
7138 pages
ISBN:9781450346559
DOI:10.1145/3025453
General Chairs:
Gloria Mark
University of California Irvine
,
Susan Fussell
Cornell University
,
Program Chairs:
Cliff Lampe
University of Michigan
,
m.c. schraefel
University of Southampton
,
Juan Pablo Hourcade
University of Iowa
,
Caroline Appert
Université Paris-Sud
,
Daniel Wigdor
University of Toronto
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 May 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
chart classification
chart recognition
data extraction
deep learning
mixed-initiative interaction
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '17 Paper Acceptance Rate600of2,400submissions,25%Overall Acceptance Rate6,199of26,314submissions,24%
More
Upcoming Conference
CHI '24

Sponsor:

sigchi

CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

Honolulu , HI , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 95
  Total Citations
  View Citations
- 1,388
  Total Downloads
- Downloads (Last 12 months)219
- Downloads (Last 6 weeks)29
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ChartSense: Interactive Data Extraction from Chart Images

CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

BarChartAnalyzer: Digitizing Images of Bar Charts

BarChartAnalyzer: Data Extraction and Summarization of Bar Charts from Images

A Survey and Approach to Chart Classification