poster

A comparison of metadata extraction techniques for crowdsourced bibliographic metadata management

Authors:
Michael Granitzer

Knowledge Management Institute, Know-Center GmbH, Graz, Austria

Knowledge Management Institute, Know-Center GmbH, Graz, Austria
View Profile

,
Maya Hristakeva

Mendeley Ltd., London, UK

Mendeley Ltd., London, UK
View Profile

,
Kris Jack

Mendeley Ltd., London, UK

Mendeley Ltd., London, UK
View Profile

,
Robert Knight

Mendeley Ltd., London, UK

Mendeley Ltd., London, UK
View Profile

SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied ComputingMarch 2012Pages 962–964https://doi.org/10.1145/2245276.2245462

Published:26 March 2012Publication History

SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing

Pages 962–964

ABSTRACT

Social research networks such as Mendeley and CiteULike offer various services for collaboratively managing bibliographic metadata and uploading textual artifacts. One core problem thereby is the extraction of bibliographic metadata from the textual artifacts. Our work investiages the use of Conditional Random Fields and Support Vector Machines, implemented in two state-of-the-art real-world systems, namely ParsCit and the Mendeley Desktop, for automatically extracting bibliographic metadata. We compare the systems' accuracy on two newly created real-world data sets gathered from Mendeley and Linked-Open-Data repositories. Our analysis shows that two-stage SVMs provide reasonable performance in solving the challenge of metadata extraction from user-provided textual artifacts.

References

ParsCit: An open-source CRF Reference String Parsing Package. European Language Resources Association, 2008.Google Scholar
H. Han, C. L. Giles, E. Manavoglu, H. Zha, Z. Zhang, and E. A. Fox. Automatic document metadata extraction using support vector machines. In JCDL'03, pages 37--48, 2003. Google ScholarDigital Library
H. Han, E. Manavoglu, H. Zha, K. Tsioutsiouliklis, C. L. Giles, and X. Zhang. Rule-based word clustering for document metadata extraction. In Proceedings of the 2005 ACM symposium on Applied computing - SAC '05, page 1049, New York, New York, USA, 2005. ACM Press. Google ScholarDigital Library
K. Seymore, A. McCallum, and R. Rosenfeld. Learning hidden Markov model structure for information extraction. In Proceedings of AAAI 99 Workshop on Machine Learning for Information Extraction, pages 37--42, 1999.Google Scholar

Index Terms

A comparison of metadata extraction techniques for crowdsourced bibliographic metadata management
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

A comparison of layout based bibliographic metadata extraction techniques
WIMS '12: Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics

Social research networks such as Mendeley and CiteULike offer various services for collaboratively managing bibliographic metadata. Compared with traditional libraries, metadata quality is of crucial importance in order to create a crowdsourced ...
Read More
Evaluation of header metadata extraction approaches and tools for scientific PDF documents
JCDL '13: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

This paper evaluates the performance of tools for the extraction of metadata from scientific articles. Accurate metadata extraction is an important task for automating the management of digital libraries. This comparative study is a guide for developers ...
Read More
Reference Metadata Extraction from Scientific Papers
PDCAT '11: Proceedings of the 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies

Bibliographical information of scientific papers is of great value since the Science Citation Index is introduced to measure research impact. Most scientific documents available on the web are unstructured or semi-structured, and the automatic reference ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing
March 2012
2179 pages
ISBN:9781450308571
DOI:10.1145/2245276
Conference Chairs:
Sascha Ossowski
University Rey Juan Carlos, Spain
,
Paola Lecca
The Microsoft Research - University of Trento COSBI, Italy
Copyright © 2012 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 March 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evaluation
metadata extraction
research papers
Qualifiers
- poster
Conference

Acceptance Rates
SAC '12 Paper Acceptance Rate270of1,056submissions,26%Overall Acceptance Rate1,650of6,669submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 281
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A comparison of metadata extraction techniques for crowdsourced bibliographic metadata management

SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A comparison of layout based bibliographic metadata extraction techniques

Evaluation of header metadata extraction approaches and tools for scientific PDF documents

Reference Metadata Extraction from Scientific Papers