Skip to main content

Identifying References to Datasets in Publications

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7489))

Abstract

Research data and publications are usually stored in separate and structurally distinct information systems. Often, links between these resources are not explicitly available which complicates the search for previous research. In this paper, we propose a pattern induction method for the detection of study references in full texts. Since these references are not specified in a standardized way and may occur inside a variety of different contexts – i.e., captions, footnotes, or continuous text – our algorithm is required to induce very flexible patterns. To overcome the sparse distribution of training instances, we induce patterns iteratively using a bootstrapping approach. We show that our method achieves promising results for the automatic identification of data references and is a first step towards building an integrated information system.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Afzal, M.T., Maurer, H., Balke, W.T., Kulathuramaiyer, N.: Rule based autonomous citation mining with tierl. Journal of Digital Information Management 8(3), 196–204 (2010)

    Google Scholar 

  2. Councill, I.G., Giles, C.L., Kan, M.Y.: Parscit: An open-source crf reference string parsing package. In: Proceedings of the Language Resources and Evaluation Conference, European Language Resources Association (2008)

    Google Scholar 

  3. Gipp, B., Beel, J.: Citation Proximity Analysis (CPA) - A new approach for identifying related work based on Co-Citation Analysis. In: Proceedings of the 12th International Conference on Scientometrics and Informetrics, vol. 2, pp. 571–575 (2009)

    Google Scholar 

  4. Kubala, F., Schwartz, R., Stone, R., Weischedel, R.: Named entity extraction from speech. In: DARPA Workshop on Broadcast News Understanding Systems, pp. 287–292 (1998)

    Google Scholar 

  5. Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165, 91–134 (2005)

    Article  Google Scholar 

  6. Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, pp. 539–545. Association for Computational Linguistics, Stroudsburg (1992)

    Chapter  Google Scholar 

  7. Berland, M., Charniak, E.: Finding parts in very large corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 57–64. Association for Computational Linguistics, Stroudsburg (1999)

    Chapter  Google Scholar 

  8. Pennacchiotti, M., Pantel, P.: A bootstrapping algorithm for automatically harvesting semantic relations. In: Proceedings of the Inference in Computational Semantics, pp. 87–96 (2006)

    Google Scholar 

  9. Meusel, R., Niepert, M., Eckert, K., Stuckenschmidt, H.: Thesaurus Extension Using Web Search Engines. In: Chowdhury, G., Koo, C., Hunter, J. (eds.) ICADL 2010. LNCS, vol. 6102, pp. 198–207. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  10. Thelen, M., Riloff, E.: A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In: Proceedings of the 2002 Conference on Empirical Methods in NLP, pp. 214–221. Association for Computational Linguistics, Stroudsburg (2002)

    Google Scholar 

  11. Xu, R., Morgan, A., Das, A.K., Garber, A.: Investigation of unsupervised pattern learning techniques for bootstrap construction of a medical treatment lexicon. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, BioNLP 2009, pp. 63–70. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  12. Altman, M., King, G.: A proposed standard for the scholarly citation of quantitative data. D-Lib Magazine 13(3) (2007)

    Google Scholar 

  13. Green, T.: We need publishing standards for datasets and data tables. OECD Publishing White Paper (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boland, K., Ritze, D., Eckert, K., Mathiak, B. (2012). Identifying References to Datasets in Publications. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds) Theory and Practice of Digital Libraries. TPDL 2012. Lecture Notes in Computer Science, vol 7489. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33290-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33290-6_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33289-0

  • Online ISBN: 978-3-642-33290-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics