Article

Deriving knowledge from figures for digital libraries

Authors:
Xiaonan Lu

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

,
James Z. Wang

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

,
Prasenjit Mitra

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

,
C. Lee Giles

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

WWW '07: Proceedings of the 16th international conference on World Wide WebMay 2007Pages 1229–1230https://doi.org/10.1145/1242572.1242780

Published:08 May 2007Publication History

WWW '07: Proceedings of the 16th international conference on World Wide Web

Pages 1229–1230

ABSTRACT

Figures in digital documents contain important information. Current digital libraries do not summarize and index information available within figures for document retrieval. We present our system on automatic categorization of figures and extraction of data from 2-D plots. A machine-learning based method is used to categorize figures into a set of predefined types based on image features. An automated algorithm is designed to extract data values from solid line curves in 2-D plots. The semantic type of figures and extracted data values from 2-D plots can be integrated with textual information within documents to provide more effective document retrieval services for digital library users. Experimental evaluation has demonstrated that our system can produce results suitable for real-world use.

References

S. Carberry, S. Elzer, and S. Demir. Information graphics: an untapped resource for digital libraries. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 581--588, 2006. Google ScholarDigital Library
C.L. Giles, K. Bollacker, and S. Lawrence. CiteSeer: An automatic citation indexing system. In Proceedings of the ACM Conference on Digital Libraries, pages 89--98, 1998. Google ScholarDigital Library
J. Li and R.M. Gray. Context--based multiscale classification of document images using wavelet coefficient distributions. IEEE Transactions on Image Processing, 9(9):1604--1616, 2000. Google ScholarDigital Library
X. Lu, P. Mitra, J.Z. Wang, and C.L. Giles. Automatic categorization of figures in scientific documents. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, pages 129--138, 2006. Google ScholarDigital Library
M. Seul, L. O'Gorman, and M.J. Sammon. Practical Algorithms for Image Analysis. Cambridge University Press, 2000. Google ScholarDigital Library

Index Terms

Deriving knowledge from figures for digital libraries
1. Information systems
  1. Information retrieval

Recommendations

Automatic categorization of figures in scientific documents
JCDL '06: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries

Figures are very important non-textual information contained in scientific documents. Current digital libraries do not provide users tools to retrieve documents based on the information available within the figures. We propose an architecture for ...
Read More
An Architecture for Information Extraction from Figures in Digital Libraries
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

Scholarly documents contain multiple figures representing experimental findings. These figures are generated from data which is not reported anywhere else in the paper. We propose a modular architecture for analyzing such figures. Our architecture ...
Read More
Discover Digital Libraries: Theory and Practice
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '07: Proceedings of the 16th international conference on World Wide Web
May 2007
1382 pages
ISBN:9781595936547
DOI:10.1145/1242572
General Chairs:
Carey Williamson
University of Calgary, Canada
,
Mary Ellen Zurko
IBM, USA
,
Program Chairs:
Peter Patel-Schneider
Bell Labs Research, USA
,
Prashant Shenoy
University of Massachusetts at Amherst, USA
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 May 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
feature extraction
figures
machine learning
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 220
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Deriving knowledge from figures for digital libraries

WWW '07: Proceedings of the 16th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic categorization of figures in scientific documents

An Architecture for Information Extraction from Figures in Digital Libraries

Discover Digital Libraries: Theory and Practice