Skip to main content

A New Approach towards Bibliographic Reference Identification, Parsing and Inline Citation Matching

  • Conference paper
Contemporary Computing (IC3 2009)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 40))

Included in the following conference series:

  • 1245 Accesses

Abstract

A number of algorithms and approaches have been proposed towards the problem of scanning and digitizing research papers. We can classify work done in the past into three major approaches: regular expression based heuristics, learning based algorithm and knowledge based systems. Our findings point to the inadequacy of existing open-source solutions such as Paracite for papers with “micro-citations” in various European Languages. This paper describes the work done as part of the Google Summer of Code 2008 using a combination of regular-expression based heuristics and knowledge-based systems to develop a system which matches inline citations to their corresponding bibliographic references and identifies and extracts metadata from references. The description, implementation and results of our approach have been presented here. Our approach enhances the accuracy and provides better recognition rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Jewel, M.: Paracite (2003), http://paracite.eprints.org/developers

  2. Giuffrida, G., Shek, E.C., Yang, J.: Knowledge-based metadata extraction from PostScript files. In: DL 2000: Proceedings of the fifth ACM conference on Digital libraries, pp. 77–84. ACM Press, New York (2000)

    Google Scholar 

  3. Powley, B., Dale, R.: Evidence-based information extraction for high accuracy citation and author name identification. In: Proceedings of RIAO 2007: The 8th Conference on Large-Scale Semantic Access to Content, Pittsburgh, Pa., USA (2007)

    Google Scholar 

  4. Sautter, G., Böhm, K., Agosti, D.: A combining approach to find all taxon names (FAT). Biodiv. Inf. 3, 46–58 (2006)

    Article  Google Scholar 

  5. Sautter, G., Böhm, K., Agosti, D.: A Quantitative Comparison of XML Schemas for Taxonomic. Biodiversity Informatics (2007)

    Google Scholar 

  6. McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Knowledge Discovery and Data Mining, pp. 169–178 (2000)

    Google Scholar 

  7. Hetzner, E.: A simple method for citation metadata extraction using hidden markov models. In: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital Libraries (2008)

    Google Scholar 

  8. Takasu: Bibliographic Attribute Extraction from Erroneous References Based on a Statistical Model. In: Proceedings of Joint Conference on Digital Libraries (2003)

    Google Scholar 

  9. Huang, I.A., Jan-Ming, H., Kao, H.Y., Lin, S.: Extracting citation metadata from online publication lists using BLAST. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS, vol. 3056, pp. 539–548. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Matt, E.D., Winkels, R., Van Engers, T.: Automated Detection of Reference Structures in Law. In: Proceedings of the Conference at University Pantheon, Assas, Paris II France, pp. 41–50 (2006)

    Google Scholar 

  11. Sautter, G., Agosti, D., Böhm, K.: Semi-Automated XML Markup of Biosystematics Legacy Literature with the GoldenGATE Editor. In: Proceedings of PSB, Wailea, HI USA (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gupta, D., Morris, B., Catapano, T., Sautter, G. (2009). A New Approach towards Bibliographic Reference Identification, Parsing and Inline Citation Matching. In: Ranka, S., et al. Contemporary Computing. IC3 2009. Communications in Computer and Information Science, vol 40. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03547-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03547-0_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03546-3

  • Online ISBN: 978-3-642-03547-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics