skip to main content
10.1145/2245276.2245462acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
poster

A comparison of metadata extraction techniques for crowdsourced bibliographic metadata management

Published:26 March 2012Publication History

ABSTRACT

Social research networks such as Mendeley and CiteULike offer various services for collaboratively managing bibliographic metadata and uploading textual artifacts. One core problem thereby is the extraction of bibliographic metadata from the textual artifacts. Our work investiages the use of Conditional Random Fields and Support Vector Machines, implemented in two state-of-the-art real-world systems, namely ParsCit and the Mendeley Desktop, for automatically extracting bibliographic metadata. We compare the systems' accuracy on two newly created real-world data sets gathered from Mendeley and Linked-Open-Data repositories. Our analysis shows that two-stage SVMs provide reasonable performance in solving the challenge of metadata extraction from user-provided textual artifacts.

References

  1. ParsCit: An open-source CRF Reference String Parsing Package. European Language Resources Association, 2008.Google ScholarGoogle Scholar
  2. H. Han, C. L. Giles, E. Manavoglu, H. Zha, Z. Zhang, and E. A. Fox. Automatic document metadata extraction using support vector machines. In JCDL'03, pages 37--48, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Han, E. Manavoglu, H. Zha, K. Tsioutsiouliklis, C. L. Giles, and X. Zhang. Rule-based word clustering for document metadata extraction. In Proceedings of the 2005 ACM symposium on Applied computing - SAC '05, page 1049, New York, New York, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Seymore, A. McCallum, and R. Rosenfeld. Learning hidden Markov model structure for information extraction. In Proceedings of AAAI 99 Workshop on Machine Learning for Information Extraction, pages 37--42, 1999.Google ScholarGoogle Scholar

Index Terms

  1. A comparison of metadata extraction techniques for crowdsourced bibliographic metadata management

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing
        March 2012
        2179 pages
        ISBN:9781450308571
        DOI:10.1145/2245276
        • Conference Chairs:
        • Sascha Ossowski,
        • Paola Lecca

        Copyright © 2012 Authors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 March 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        SAC '12 Paper Acceptance Rate270of1,056submissions,26%Overall Acceptance Rate1,650of6,669submissions,25%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader