Identifying References to Datasets in Publications

Boland, Katarina; Ritze, Dominique; Eckert, Kai; Mathiak, Brigitte

doi:10.1007/978-3-642-33290-6_17

Identifying References to Datasets in Publications

Katarina Boland¹⁹,
Dominique Ritze²⁰,
Kai Eckert²⁰ &
…
Brigitte Mathiak¹⁹

Conference paper

2391 Accesses
16 Citations
13 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7489))

Abstract

Research data and publications are usually stored in separate and structurally distinct information systems. Often, links between these resources are not explicitly available which complicates the search for previous research. In this paper, we propose a pattern induction method for the detection of study references in full texts. Since these references are not specified in a standardized way and may occur inside a variety of different contexts – i.e., captions, footnotes, or continuous text – our algorithm is required to induce very flexible patterns. To overcome the sparse distribution of training instances, we induce patterns iteratively using a bootstrapping approach. We show that our method achieves promising results for the automatic identification of data references and is a first step towards building an integrated information system.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Afzal, M.T., Maurer, H., Balke, W.T., Kulathuramaiyer, N.: Rule based autonomous citation mining with tierl. Journal of Digital Information Management 8(3), 196–204 (2010)
Google Scholar
Councill, I.G., Giles, C.L., Kan, M.Y.: Parscit: An open-source crf reference string parsing package. In: Proceedings of the Language Resources and Evaluation Conference, European Language Resources Association (2008)
Google Scholar
Gipp, B., Beel, J.: Citation Proximity Analysis (CPA) - A new approach for identifying related work based on Co-Citation Analysis. In: Proceedings of the 12th International Conference on Scientometrics and Informetrics, vol. 2, pp. 571–575 (2009)
Google Scholar
Kubala, F., Schwartz, R., Stone, R., Weischedel, R.: Named entity extraction from speech. In: DARPA Workshop on Broadcast News Understanding Systems, pp. 287–292 (1998)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165, 91–134 (2005)
Article Google Scholar
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, pp. 539–545. Association for Computational Linguistics, Stroudsburg (1992)
Chapter Google Scholar
Berland, M., Charniak, E.: Finding parts in very large corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 57–64. Association for Computational Linguistics, Stroudsburg (1999)
Chapter Google Scholar
Pennacchiotti, M., Pantel, P.: A bootstrapping algorithm for automatically harvesting semantic relations. In: Proceedings of the Inference in Computational Semantics, pp. 87–96 (2006)
Google Scholar
Meusel, R., Niepert, M., Eckert, K., Stuckenschmidt, H.: Thesaurus Extension Using Web Search Engines. In: Chowdhury, G., Koo, C., Hunter, J. (eds.) ICADL 2010. LNCS, vol. 6102, pp. 198–207. Springer, Heidelberg (2010)
Chapter Google Scholar
Thelen, M., Riloff, E.: A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In: Proceedings of the 2002 Conference on Empirical Methods in NLP, pp. 214–221. Association for Computational Linguistics, Stroudsburg (2002)
Google Scholar
Xu, R., Morgan, A., Das, A.K., Garber, A.: Investigation of unsupervised pattern learning techniques for bootstrap construction of a medical treatment lexicon. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, BioNLP 2009, pp. 63–70. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Altman, M., King, G.: A proposed standard for the scholarly citation of quantitative data. D-Lib Magazine 13(3) (2007)
Google Scholar
Green, T.: We need publishing standards for datasets and data tables. OECD Publishing White Paper (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
Katarina Boland & Brigitte Mathiak
Mannheim University Library, Mannheim, Germany
Dominique Ritze & Kai Eckert

Authors

Katarina Boland
View author publications
You can also search for this author in PubMed Google Scholar
Dominique Ritze
View author publications
You can also search for this author in PubMed Google Scholar
Kai Eckert
View author publications
You can also search for this author in PubMed Google Scholar
Brigitte Mathiak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Multimedia and Graphic Arts, Cyprus University of Technology, 3036, Limassol, Cyprus
Panayiotis Zaphiris & Fernando Loizides &
School of Informatics, City University of London, Northampton Square, EC1V 0HB, London, UK
George Buchanan
School of Library, Archival and Information Studies, Irving K. Barber Learning Centre, The University of British Columbia, V6T 1Z3, Vancouver, BC, Canada
Edie Rasmussen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boland, K., Ritze, D., Eckert, K., Mathiak, B. (2012). Identifying References to Datasets in Publications. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds) Theory and Practice of Digital Libraries. TPDL 2012. Lecture Notes in Computer Science, vol 7489. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33290-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-33290-6_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33289-0
Online ISBN: 978-3-642-33290-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics