A Frame-Based Approach for Reference Metadata Extraction

Hsieh, Yu-Lun; Liu, Shih-Hung; Yang, Ting-Hao; Chen, Yu-Hsuan; Chang, Yung-Chun; Hsieh, Gladys; Shih, Cheng-Wei; Lu, Chun-Hung; Hsu, Wen-Lian

doi:10.1007/978-3-319-13987-6_15

Yu-Lun Hsieh²¹,
Shih-Hung Liu²¹,
Ting-Hao Yang²¹,
Yu-Hsuan Chen²¹,
Yung-Chun Chang²¹,
Gladys Hsieh²¹,
Cheng-Wei Shih²¹,
Chun-Hung Lu²² &
…
Wen-Lian Hsu²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8916))

Included in the following conference series:

International Conference on Technologies and Applications of Artificial Intelligence

1615 Accesses
4 Citations

Abstract

In this paper, we propose a novel frame-based approach (FBA) and use reference metadata extraction as a case study to demonstrate its advantages. The main contributions of this research are three-fold. First, the new frame matching algorithm, based on sequence alignment, can compensate for the shortcomings of traditional rule-based approach, in which rule matching lacks flexibility and generality. Second, an approximate matching is adopted for capturing reasonable abbreviations or errors in the input reference string to further increase the coverage of the frames. Third, experiments conducted on extensive datasets show that the same knowledge framework performed equally well on various untrained domains. Comparing to a widely-used machine learning method, Conditional Random Fields (CRFs), the FBA can drastically reduce the average field error rate across all four independent test sets by 70% (2.24% vs. 7.54%).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agichtein, E., Ganti, V.: Mining reference tables for automatic text segmentation. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 20–29 (2004)
Google Scholar
Chen, C.C., Yang, K.H., Chen, C.L., Ho, J.M.: BibPro: A citation parser based on sequence alignment. IEEE Transactions on Knowledge and Data Engineering 24(2), 236–250 (2012)
Article Google Scholar
Chowdhury, G.: Template mining for information extraction from digital documents. Library Trends 48, 182–208 (1999)
Google Scholar
Cortez, E., da Silva, A.S., Goncalves, M.A., Mesquita, F., de Moura, E.S.: FLUX-CiM: Flexible unsupervised extraction of citation metadata. In: Proceedings of the Seventh ACM/IEEE-CS Joint Conf. Digital Libraries, pp. 215–224 (2007)
Google Scholar
Day, M.Y., Tsai, T.H., Sung, C.L., Hsieh, C.C., Lee, C.W., Wu, S.H., Wu, K.P., Ong, C.S., Hsu, W.L.: Reference metadata extraction using a hierarchical knowledge representation framework. Decision Support Systems 43, 152–167 (2007)
Article Google Scholar
Ding, Y., Chowdhury, G., Foo, S.: Template mining for the extraction of citation from digital documents. In: Proceedings of the Second Asian Digital Library Conference, pp. 47–62 (1999)
Google Scholar
Giles, C.L., Bollacker, K.D., Lawrence, S.: CiteSeer: An automatic citation indexing system. In: Proceedings of the Third ACM Conference on Digital Libraries, pp. 89–98 (1998)
Google Scholar
Han, H.C., Giles, L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines. In: Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital libraries, pp. 37–48 (2003)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, Inc. (1997)
Google Scholar
Peng, F., McCallum, A.: Accurate information extraction from research papers using conditional random fields. In: Proceedings of the Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp. 320–336 (2004)
Google Scholar
Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden markov model structure for information extraction. In: Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction, pp. 37–42 (1999)
Google Scholar
Wu, S.H., Tsai, T.H., Hsu, W.L.: Domain event extraction and representation with domain ontology. In: Proceedings of the IJCAI 2003 Workshop on Information Integration on the Web, Acapulco, Mexico (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Science, Academia Sinica, Taipei, Taiwan
Yu-Lun Hsieh, Shih-Hung Liu, Ting-Hao Yang, Yu-Hsuan Chen, Yung-Chun Chang, Gladys Hsieh, Cheng-Wei Shih & Wen-Lian Hsu
Innovative Digitech-Enabled Applications & Services Institute, III, Taiwan
Chun-Hung Lu

Authors

Yu-Lun Hsieh
View author publications
You can also search for this author in PubMed Google Scholar
Shih-Hung Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ting-Hao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Hsuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yung-Chun Chang
View author publications
You can also search for this author in PubMed Google Scholar
Gladys Hsieh
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Wei Shih
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Hung Lu
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Lian Hsu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, No. 43, Sec. 4, Keelung Rd., Da’an Dist., 106, Taipei City, Taiwan
Shin-Ming Cheng
Department of Information Management, Tamkang University, No. 151, Yingzhuan Rd., Danshui Dist., 25137, New Taipei City, Taiwan
Min-Yuh Day

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hsieh, YL. et al. (2014). A Frame-Based Approach for Reference Metadata Extraction. In: Cheng, SM., Day, MY. (eds) Technologies and Applications of Artificial Intelligence. TAAI 2014. Lecture Notes in Computer Science(), vol 8916. Springer, Cham. https://doi.org/10.1007/978-3-319-13987-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-13987-6_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13986-9
Online ISBN: 978-3-319-13987-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics