skip to main content
10.1145/3132847.3133074acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Citation Metadata Extraction via Deep Neural Network-based Segment Sequence Labeling

Published: 06 November 2017 Publication History

Abstract

Citation metadata extraction plays an important role in academic information retrieval and knowledge management. Current works on this task generally use rule-based, template-based or learning-based approaches but these methods usually either rely on handcrafted features or are limited with domains. Recently, neural networks have shown strong ability in addressing sequence labeling tasks.
In this paper, we propose a sequence labeling model for citation metadata extraction, called segment sequence labeling. Instead of inferring at word level, the input sequence is first divided into segments, and then features of the segments are computed to infer the label sequence of the segments. We first run experiments to validate the effectiveness of different parts of the model by comparing it with a CRF-based model and a neural network-based model. Experimental results show our model beats both models on most fields. Besides, our model is evaluated on public datasets UMass and Cora and has achieved significant performance improvement. Our model was trained on the data which were generated from BibTeX files collected on the Web and annotated automatically.

References

[1]
Sam Anzaroot and Andrew McCallum. 2013. A new dataset for fine-grained citation field extraction ICML Workshop on Peer Reviewing and Publishing Models.
[2]
Chien-Chih Chen, Kai-Hsiang Yang, Chuen-Liang Chen, and Jan-Ming Ho. 2012. Bibpro: A citation parser based on sequence alignment. IEEE Transactions on Knowledge and Data Engineering, Vol. 24, 2 (2012), 236--250.
[3]
Jason PC Chiu and Eric Nichols. 2016. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics Vol. 4 (2016), 357--370.
[4]
Isaac G Councill, C Lee Giles, and Min-Yen Kan. 2008. ParsCit: an Open-source CRF Reference String Parsing Package. LREC, Vol. Vol. 8. 661--667.
[5]
Hui Han, C Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang, and Edward A Fox. 2003. Automatic document metadata extraction using support vector machines Digital Libraries, 2003. Proceedings. 2003 Joint Conference on. IEEE, 37--48.
[6]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[7]
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).
[8]
John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data Proceedings of the eighteenth international conference on machine learning, ICML, Vol. Vol. 1. 282--289.
[9]
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).
[10]
Xuezhe Ma and Eduard Hovy. 2016. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354 (2016).
[11]
Fuchun Peng and Andrew McCallum. 2006. Information extraction from research papers using conditional random fields. Information processing & management Vol. 42, 4 (2006), 963--979.
[12]
Kristie Seymore, Andrew McCallum, and Roni Rosenfeld. 1999. Learning hidden Markov model structure for information extraction AAAI-99 Workshop on Machine Learning for Information Extraction. 37--42.

Cited By

View all
  • (2024)An Anchor Learning Approach for Citation Field LearningICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448007(12346-12350)Online publication date: 14-Apr-2024
  • (2022)Vision and natural language for metadata extraction from scientific PDF documentsProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3533295(1-5)Online publication date: 20-Jun-2022
  • (2018)Information extraction from scientific articlesScientometrics10.1007/s11192-018-2921-5117:3(1931-1990)Online publication date: 1-Dec-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. academic information extraction
  2. citation metadata extraction
  3. information retrieval
  4. sequence labeling

Qualifiers

  • Short-paper

Funding Sources

Conference

CIKM '17
Sponsor:

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Anchor Learning Approach for Citation Field LearningICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448007(12346-12350)Online publication date: 14-Apr-2024
  • (2022)Vision and natural language for metadata extraction from scientific PDF documentsProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3533295(1-5)Online publication date: 20-Jun-2022
  • (2018)Information extraction from scientific articlesScientometrics10.1007/s11192-018-2921-5117:3(1931-1990)Online publication date: 1-Dec-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media