A New Approach towards Bibliographic Reference Identification, Parsing and Inline Citation Matching

Gupta, Deepank; Morris, Bob; Catapano, Terry; Sautter, Guido

doi:10.1007/978-3-642-03547-0_10

Deepank Gupta⁹,
Bob Morris¹⁰,
Terry Catapano¹¹ &
…
Guido Sautter¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 40))

Included in the following conference series:

International Conference on Contemporary Computing

1245 Accesses

Abstract

A number of algorithms and approaches have been proposed towards the problem of scanning and digitizing research papers. We can classify work done in the past into three major approaches: regular expression based heuristics, learning based algorithm and knowledge based systems. Our findings point to the inadequacy of existing open-source solutions such as Paracite for papers with “micro-citations” in various European Languages. This paper describes the work done as part of the Google Summer of Code 2008 using a combination of regular-expression based heuristics and knowledge-based systems to develop a system which matches inline citations to their corresponding bibliographic references and identifies and extracts metadata from references. The description, implementation and results of our approach have been presented here. Our approach enhances the accuracy and provides better recognition rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Structured References from PDF Articles: Assessing the Tools for Bibliographic Reference Extraction and Parsing

Language-Free Regular Expression Search of Document’s References

An Evaluation of the Effect of Reference Strings and Segmentation on Citation Matching

References

Jewel, M.: Paracite (2003), http://paracite.eprints.org/developers
Giuffrida, G., Shek, E.C., Yang, J.: Knowledge-based metadata extraction from PostScript files. In: DL 2000: Proceedings of the fifth ACM conference on Digital libraries, pp. 77–84. ACM Press, New York (2000)
Google Scholar
Powley, B., Dale, R.: Evidence-based information extraction for high accuracy citation and author name identification. In: Proceedings of RIAO 2007: The 8th Conference on Large-Scale Semantic Access to Content, Pittsburgh, Pa., USA (2007)
Google Scholar
Sautter, G., Böhm, K., Agosti, D.: A combining approach to find all taxon names (FAT). Biodiv. Inf. 3, 46–58 (2006)
Article Google Scholar
Sautter, G., Böhm, K., Agosti, D.: A Quantitative Comparison of XML Schemas for Taxonomic. Biodiversity Informatics (2007)
Google Scholar
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Knowledge Discovery and Data Mining, pp. 169–178 (2000)
Google Scholar
Hetzner, E.: A simple method for citation metadata extraction using hidden markov models. In: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital Libraries (2008)
Google Scholar
Takasu: Bibliographic Attribute Extraction from Erroneous References Based on a Statistical Model. In: Proceedings of Joint Conference on Digital Libraries (2003)
Google Scholar
Huang, I.A., Jan-Ming, H., Kao, H.Y., Lin, S.: Extracting citation metadata from online publication lists using BLAST. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS, vol. 3056, pp. 539–548. Springer, Heidelberg (2004)
Chapter Google Scholar
Matt, E.D., Winkels, R., Van Engers, T.: Automated Detection of Reference Structures in Law. In: Proceedings of the Conference at University Pantheon, Assas, Paris II France, pp. 41–50 (2006)
Google Scholar
Sautter, G., Agosti, D., Böhm, K.: Semi-Automated XML Markup of Biosystematics Legacy Literature with the GoldenGATE Editor. In: Proceedings of PSB, Wailea, HI USA (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Netaji Subhas Institute of Technology, Plazi
Deepank Gupta
University of Massachusetts, Boston, Plazi
Bob Morris
Columbia University, Plazi
Terry Catapano
University of Karlsruhe, Plazi
Guido Sautter

Authors

Deepank Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Bob Morris
View author publications
You can also search for this author in PubMed Google Scholar
Terry Catapano
View author publications
You can also search for this author in PubMed Google Scholar
Guido Sautter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Sciences, University of Florida, Gainesville, FL, 32611, USA
Sanjay Ranka
Laurence H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, USA
Srinivas Aluru
Grid Computing and Distributed Systems Laboratory and, NICTA Victoria Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Australia
Rajkumar Buyya
Department of Computer Science, National Tsing Hua University, Taiwan
Yeh-Ching Chung
Computer Science, College of Engineering and Science, Louisiana Tech University, Ruston, LA, 71272, USA
Sumeet Dua & Vir V. Phoha &
Department of Computer Sciences, Purdue University, W. Lafayette, IN, 47907, USA
Ananth Grama
Arizona State University, Tempe, AZ, 85281, USA
Sandeep K. S. Gupta
Computer Science and Engineering Department, Indian Institute of Technology Kharagpur, 721 302, WB, India
Rajeev Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, D., Morris, B., Catapano, T., Sautter, G. (2009). A New Approach towards Bibliographic Reference Identification, Parsing and Inline Citation Matching. In: Ranka, S., et al. Contemporary Computing. IC3 2009. Communications in Computer and Information Science, vol 40. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03547-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-03547-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03546-3
Online ISBN: 978-3-642-03547-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics