skip to main content
10.1145/3025453.3025957acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

ChartSense: Interactive Data Extraction from Chart Images

Authors Info & Claims
Published:02 May 2017Publication History

ABSTRACT

Charts are commonly used to present data in digital documents such as web pages, research papers, or presentation slides. When the underlying data is not available, it is necessary to extract the data from a chart image to utilize the data for further analysis or improve the chart for more accurate perception. In this paper, we present ChartSense, an interactive chart data extraction system. ChartSense first determines the chart type of a given chart image using a deep learning based classifier, and then extracts underlying data from the chart image using semi-automatic, interactive extraction algorithms optimized for each chart type. To evaluate chart type classification accuracy, we compared ChartSense with ReVision, a system with the state-of-the-art chart type classifier. We found that ChartSense was more accurate than ReVision. In addition, to evaluate data extraction performance, we conducted a user study, comparing ChartSense with WebPlotDigitizer, one of the most effective chart data extraction tools among publicly accessible ones. Our results showed that ChartSense was better than WebPlotDigitizer in terms of task completion time, error rate, and subjective preference.

Skip Supplemental Material Section

Supplemental Material

pn4070-file3.mp4

mp4

18.6 MB

pn4070p.mp4

mp4

1.5 MB

References

  1. Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D³ data-driven documents. IEEE Trans. Vis. Comput. Graphics 17, 12: 2301--2309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Léon Bottou. 2012. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade, Grégoire Montavon, Geneviève, Orr, Klaus-Robert Müller (eds.). Springer Berlin Heidelberg, 421--436.Google ScholarGoogle Scholar
  3. John Canny. 1986. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell PAMI-8, 6: 679--698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Zhe Chen, Michael Cafarella, and Eytan Adar. 2015. DiagramFlyer: A search engine for data-driven diagrams. In Proceedings of the 24th International Conference on World Wide Web Companion (WWW '15), 183--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sagnik Ray Choudhury and Clyde Lee Giles. 2015. An architecture for information extraction from figures in digital libraries. In Proceedings of the 24th International Conference on World Wide Web Companion (WWW '15), 667--672. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jinglun Gao, Yin Zhou, and Kenneth E. Barner. 2012. View: Visual information extraction widget for improving chart images accessibility. In Proceedings of the 19th IEEE International Conference on Image Processing (ICIP '12), 2865--2868Google ScholarGoogle Scholar
  7. Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie G. Karahalios. 2015. DataTone: Managing ambiguity in natural language interfaces for data Visualization. In Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology (UIST '15), 489--500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Arnd Gross, Sibylle Schirm, and Markus Scholz. 2014. Ycasd--a tool for capturing and scaling data from graphical representations. BMC bioinformatics 15, 1: 219.Google ScholarGoogle Scholar
  9. Jonathan Harper and Maneesh Agrawala. 2014. Deconstructing and restyling D3 visualizations. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (UIST '14), 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Christopher G. Healey, Sarat Kocherlakota, Vivek Rao, Reshma Mehta, and Renee St. Amant. 2008. Visual perception and mixed-initiative interaction for assisted visualization design. IEEE Trans. Vis. Comput. Graphics 14, 2: 396--411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (CHI '99), 159--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Weihua Huang, Chew Lim Tan, and Wee Kheng Leow. 2004. Model-based chart image recognition. In Graphics Recognition. Recent Advances and Perspectives, Josep Lladós and Young-Bin Kwon (eds.). Springer Berlin Heidelberg, 87--99.Google ScholarGoogle Scholar
  13. Weihua Huang, Ruizhe Liu, and Chew Lim Tan. 2007. Extraction of vectorized graphical information from scientific chart images. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR '07), 521--525. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (MM '14), 675--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nicholas Kong and Maneesh Agrawala. 2012. Graphical overlays: Using layered elements to aid chart reading. IEEE Trans. Vis. Comput. Graphics 18, 12: 2631--2638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Nicholas Kong, Marti A. Hearst, and Maneesh Agrawala. 2014. Extracting references between text and charts via crowdsourcing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '14), 31--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th Neural Information Processing Systems (NIPS '12), 1106--1114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11: 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  19. Yan Liu, Xiaoqing Lu, Yeyang Qin, Zhi Tang, and Jianbo Xu. 2013. Review of chart recognition in document images. In IS&T/SPIE Electronic Imaging, 865410--865410.Google ScholarGoogle Scholar
  20. Jock Mackinlay. 1986. Automating the design of graphical presentations of relational information. ACM Trans. Graph. 5, 2: 110--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gonzalo Gabriel Méndez, Miguel A. Nacenta, and Sebastien Vandenheste. 2016. iVoLVER: Interactive visual language for visualization extraction and reconstruction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '16), 4073--4085 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Aria Pezeshk and Richard L. Tutwiler. 2011. Automatic feature extraction and text recognition from scanned topographic maps. IEEE Trans. Geosci. Remote Sens 49, 12: 5047--5063.Google ScholarGoogle ScholarCross RefCross Ref
  23. Ankit Rohatgi. 2015. WebPlotDigitizer, Version 3.8. Retrieved September 22, 2015 from http://arohatgi.info/WebPlotDigitizerGoogle ScholarGoogle Scholar
  24. Manolis Savva, Nicholas Kong, Arti Chhajta, Li FeiFei, Maneesh Agrawala, and Jeffrey Heer. 2011. ReVision: Automated classification, analysis and redesign of chart images. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST '11), 393--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Julia Schwarz, Scott Hudson, Jennifer Mankoff, and Andrew D. Wilson. 2010. A framework for robust and flexible handling of inputs with uncertainty. In Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology (UIST '10), 47--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mingyan Shao and Robert P. Futrelle. 2006. Recognition and classification of figures in PDF documents. In Graphics Recognition. Ten Years Review and Future Perspectives, Wenyin Liu and Josep Lladós (eds.). Springer Berlin Heidelberg, 231242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Noah Siegel, Zachary Horvitz, Roie Levin, Santosh Divvala, and Ali Farhadi. 2016. FigureSeer: Parsing result-figures in research papers. In Proceedings of the 14th European Conference on Computer Vision (ECCV '16), 664--680.Google ScholarGoogle ScholarCross RefCross Ref
  28. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '15), 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  29. Karl Tombre, Salvatore Tabbone, Loïc Pélissier, Bart Lamiroy, and Philippe Dosch. 2002. Text/graphics separation revisited. In Document Analysis Systems V, Daniel Lopresti, Jianying Hu, and Ramanujan Kashi (eds.). Springer Berlin Heidelberg, 200--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Bas Tummers. 2015. DataTheif III. Retrieved September 22, 2015 from http://www.datathief.org/Google ScholarGoogle Scholar
  31. Colin Ware, 2012. Information Visualization: perception for design (3rd. ed.). Elsevier. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Qixiang Ye and David Doermann. 2015. Text detection and recognition in imagery: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 37, 7: 1480--1500.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ChartSense: Interactive Data Extraction from Chart Images

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems
      May 2017
      7138 pages
      ISBN:9781450346559
      DOI:10.1145/3025453

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 May 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '17 Paper Acceptance Rate600of2,400submissions,25%Overall Acceptance Rate6,199of26,314submissions,24%

      Upcoming Conference

      CHI '24
      CHI Conference on Human Factors in Computing Systems
      May 11 - 16, 2024
      Honolulu , HI , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader