Skip to main content

A Frame-Based Approach for Reference Metadata Extraction

  • Conference paper
Technologies and Applications of Artificial Intelligence (TAAI 2014)

Abstract

In this paper, we propose a novel frame-based approach (FBA) and use reference metadata extraction as a case study to demonstrate its advantages. The main contributions of this research are three-fold. First, the new frame matching algorithm, based on sequence alignment, can compensate for the shortcomings of traditional rule-based approach, in which rule matching lacks flexibility and generality. Second, an approximate matching is adopted for capturing reasonable abbreviations or errors in the input reference string to further increase the coverage of the frames. Third, experiments conducted on extensive datasets show that the same knowledge framework performed equally well on various untrained domains. Comparing to a widely-used machine learning method, Conditional Random Fields (CRFs), the FBA can drastically reduce the average field error rate across all four independent test sets by 70% (2.24% vs. 7.54%).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agichtein, E., Ganti, V.: Mining reference tables for automatic text segmentation. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 20–29 (2004)

    Google Scholar 

  2. Chen, C.C., Yang, K.H., Chen, C.L., Ho, J.M.: BibPro: A citation parser based on sequence alignment. IEEE Transactions on Knowledge and Data Engineering 24(2), 236–250 (2012)

    Article  Google Scholar 

  3. Chowdhury, G.: Template mining for information extraction from digital documents. Library Trends 48, 182–208 (1999)

    Google Scholar 

  4. Cortez, E., da Silva, A.S., Goncalves, M.A., Mesquita, F., de Moura, E.S.: FLUX-CiM: Flexible unsupervised extraction of citation metadata. In: Proceedings of the Seventh ACM/IEEE-CS Joint Conf. Digital Libraries, pp. 215–224 (2007)

    Google Scholar 

  5. Day, M.Y., Tsai, T.H., Sung, C.L., Hsieh, C.C., Lee, C.W., Wu, S.H., Wu, K.P., Ong, C.S., Hsu, W.L.: Reference metadata extraction using a hierarchical knowledge representation framework. Decision Support Systems 43, 152–167 (2007)

    Article  Google Scholar 

  6. Ding, Y., Chowdhury, G., Foo, S.: Template mining for the extraction of citation from digital documents. In: Proceedings of the Second Asian Digital Library Conference, pp. 47–62 (1999)

    Google Scholar 

  7. Giles, C.L., Bollacker, K.D., Lawrence, S.: CiteSeer: An automatic citation indexing system. In: Proceedings of the Third ACM Conference on Digital Libraries, pp. 89–98 (1998)

    Google Scholar 

  8. Han, H.C., Giles, L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines. In: Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital libraries, pp. 37–48 (2003)

    Google Scholar 

  9. Mitchell, T.M.: Machine Learning. McGraw-Hill, Inc. (1997)

    Google Scholar 

  10. Peng, F., McCallum, A.: Accurate information extraction from research papers using conditional random fields. In: Proceedings of the Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp. 320–336 (2004)

    Google Scholar 

  11. Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden markov model structure for information extraction. In: Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction, pp. 37–42 (1999)

    Google Scholar 

  12. Wu, S.H., Tsai, T.H., Hsu, W.L.: Domain event extraction and representation with domain ontology. In: Proceedings of the IJCAI 2003 Workshop on Information Integration on the Web, Acapulco, Mexico (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Hsieh, YL. et al. (2014). A Frame-Based Approach for Reference Metadata Extraction. In: Cheng, SM., Day, MY. (eds) Technologies and Applications of Artificial Intelligence. TAAI 2014. Lecture Notes in Computer Science(), vol 8916. Springer, Cham. https://doi.org/10.1007/978-3-319-13987-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13987-6_15

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13986-9

  • Online ISBN: 978-3-319-13987-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics