Abstract
Data citation has a profound impact on the reproducibility of science, a hot topic in many disciplines such as as astronomy, biology, physics, computer science and more. Lately, several authoritative journals have been requesting the sharing of data and the provision of validation methodologies for experiments (e.g., Nature Scientific Data and Nature Physics); these publications and the publishing industry in general see data citation as the means to provide new, reliable and usable means for sharing and referring to scientific data. In this paper, we present the state of the art of data citation and we discuss open issues and research directions with a specific focus on reproducibility. Furthermore, we investigate reproducibility issues by using experimental evaluation in Information Retrieval (IR) as a test case. (This paper is a revised and extended version of [33, 35, 57]).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Notes
- 1.
- 2.
- 3.
- 4.
Actually, this would be difficult to achieve.
- 5.
- 6.
References
Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data, vol. 12. CODATA-ICSTI Task Group on Data Citation Standards and Practices, September 2013
Reproducibility and reliability of biomedical research: improving research practice. Technical report, The Academy of Medical Science (2015)
Freire, J., Fuhr, N., Rauber, A. (eds.): Report from Dagstuhl Seminar 16041: Reproducibility of Data-Oriented Experiments in e-Science. Dagstuhl Reports, vol. 6, no. 1. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Germany (2016)
Agosti, M., Di Buccio, E., Ferro, N., Masiero, I., Peruzzo, S., Silvello, G.: DIRECTions: design and specification of an IR evaluation infrastructure. In: Catarci, T., Forner, P., Hiemstra, D., Peñas, A., Santucci, G. (eds.) CLEF 2012. LNCS, vol. 7488, pp. 88–99. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33247-0_11
Agosti, M., Di Nunzio, G.M., Ferro, N.: The importance of scientific data curation for evaluation campaigns. In: Thanos, C., Borri, F., Candela, L. (eds.) DELOS 2007. LNCS, vol. 4877, pp. 157–166. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77088-6_15
Agosti, M., Ferro, N.: Towards an evaluation infrastructure for DL performance evaluation. In: Tsakonas, G., Papatheodorou, C. (eds.) Evaluation of Digital Libraries: An Insight into Useful Applications and Methods, pp. 93–120. Chandos Publishing, Oxford (2009)
Alonso, O., Mizzaro, S.: Using crowdsourcing for TREC relevance assessment. Inf. Process. Manage. 48(6), 1053–1066 (2012)
Altman, M., Crosas, M.: The evolution of data citation: from principles to implementation. IAssist Q. 37(1–4), 62–70 (2013)
Altman, M., King, G.: A proposed standard for the scholarly citation of quantitative data. IASSIST (2006). http://www.iassistdata.org/conferences/archive/2006
Amigó, E., Corujo, A., Gonzalo, J., Meij, E., de Rijke, M.: Overview of RepLab 2012: evaluating online reputation management systems. In: Forner, P., Karlgren, J., Womser-Hacker, C., Ferro, N. (eds.) CLEF 2012 Working Notes. CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613–0073 (2012). http://ceur-ws.org/Vol-1178/
Angelini, M., Ferro, N., Larsen, B., Müller, H., Santucci, G., Silvello, G., Tsikrika, T.: Measuring and analyzing the scholarly impact of experimental evaluation initiatives. Procedia Comput. Sci. 38, 133–137 (2014)
Arguello, J., Crane, M., Diaz, F., Lin, J., Trotman, A.: Report on the SIGIR 2015 workshop on reproducibility, inexplicability, and generalizability of results (RIGOR). SIGIR Forum 49(2), 107–116 (2015)
Armstrong, T.G., Moffat, A., Webber, W., Zobel, J.: EvaluatIR: an online tool for evaluating and comparing IR systems. In: Allan, J., Aslam, J.A., Sanderson, M., Zhai, C., Zobel, J. (eds.) Proceedings of 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), USA, p. 833. ACM, New York (2009)
Badan, A., Benvegnù, L., Biasetton, M., Bonato, G., Brighente, A., Cenzato, A., Ceron, P., Cogato, G., Marchesin, S., Minetto, A., Pellegrina, L., Purpura, A., Simionato, R., Soleti, N., Tessarotto, M., Tonon, A., Vendramin, F., Ferro, N.: Towards open-source shared implementations of keyword-based access systems to relational data. In: Ferro, N., Guerra, F., Ives, Z., Silvello, G., Theobald, M. (eds.) Proceedings of 1st International Workshop on Keyword-Based Access and Ranking at Scale (KARS 2017) - Proceedings of the Workshops of the EDBT/ICDT 2017 Joint Conference (EDBT/ICDT 2017). CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613–0073 (2017). http://ceur-ws.org/Vol-1810/
Badan, A., BenvegnĂą, L., Biasetton, M., Bonato, G., Brighente, A., Marchesin, S., Minetto, A., Pellegrina, L., Purpura, A., Simionato, R., Soleti, N., Tessarotto, M., Tonon, A., Ferro, N.: Keyword-based access to relational data: to reproduce, or to not reproduce? In: Greco et al. [39]
Baggerly, K.: Disclose all data in publications. Nature 467, 401 (2010)
Bardi, A., Manghi, P.: A framework supporting the shift from traditional digital publications to enhanced publications. D-Lib Magaz. 21(1/2) (2015). http://dx.doi.org/10.1045/january2015-bardi
Bloom, T., Ganly, E., Winker, M.: Data access for the open access literature: PLOS’s data policy. PLoS Biol. 12(2), e1001797 (2014)
Borgman, C.L.: The conundrum of sharing research data. JASIST 63(6), 1059–1078 (2012). http://dx.doi.org/10.1002/asi.22634
Borgman, C.L.: Why are the attribution and citation of scientific data important? In: Board on Research Data and Information, Policy and Global Affairs Division, National Academy of Sciences (eds.) Report from Developing Data Attribution and Citation Practices and Standards: An International Symposium and Workshop, pp. 1–8. National Academies Press, Washington DC (2012)
Borgman, C.L.: Big Data, Little Data, No Data. MIT Press, Cambridge (2015)
Buneman, P., Davidson, S.B., Frew, J.: Why data citation is a computational problem. Commun. ACM (CACM) 59(9), 50–57 (2016)
Buneman, P., Khanna, S., Tajima, K., Tan, W.C.: Archiving scientific data. ACM Trans. Database Syst. (TODS) 29(1), 2–42 (2004)
Buneman, P., Silvello, G.: A rule-based citation system for structured and evolving datasets. IEEE Data Eng. Bull. 33(3), 33–41 (2010). http://sites.computer.org/debull/A10sept/buneman.pdf
Burton, A., Koers, H., Manghi, P., La Bruzzo, S., Aryani, A., Diepenbroek, M., Schindler, U.: On bridging data centers and publishers: the data-literature interlinking service. In: Garoufallou, E., Hartley, R.J., Gaitanou, P. (eds.) MTSR 2015. CCIS, vol. 544, pp. 324–335. Springer, Cham (2015). doi:10.1007/978-3-319-24129-6_28
Candela, L., Castelli, D., Manghi, P., Tani, A.: Data journals: a survey. J. Assoc. Inf. Sci. Technol. 66(9), 1747–1762 (2015). http://dx.doi.org/10.1002/asi.23358
Carr, D., Littler, K.: Sharing research data to improve public health: a funder perspective. J. Empir. Res. Hum. Res. Ethics 10(3), 314–316 (2015)
Davidson, S.B., Deutsch, D., Milo, T., Silvello, G.: A model for fine-grained data citation. In: Greco et al. [39]
Davidson, S.B., Deutsch, D., Tova, M., Silvello, G.: A model for fine-grained data citation. In: 8th Biennial Conference on Innovative Data Systems Research (CIDR 2017) (2017)
Davidson, S.B., Buneman, P., Deutch, D., Milo, T., Silvello, G.: Data citation: a computational challenge. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2017), USA, pp. 1–4 (2017). http://doi.acm.org/10.1145/3034786.3056123
De Roure, D.: The future of scholarly communications. Insights 27(3), 233–238 (2014)
Dussin, M., Ferro, N.: Managing the knowledge creation process of large-scale evaluation campaigns. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 63–74. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04346-8_8
Ferro, N.: Reproducibility challenges in information retrieval evaluation. ACM J. Data Inf. Qual. (JDIQ) 8(2), 8:1–8:4 (2017)
Ferro, N., et al. (eds.): ECIR 2016. LNCS, vol. 9626. Springer, Cham (2016)
Ferro, N., Fuhr, N., Järvelin, K., Kando, N., Lippold, M., Zobel, J.: Increasing reproducibility in IR: findings from the dagstuhl seminar on “reproducibility of data-oriented experiments in e-science”. SIGIR Forum 50(1), 68–82 (2016)
Ferro, N., Silvello, G.: Rank-biased precision reloaded: reproducibility and generalization. In: Hanbury et al. [41], pp. 768–780
FORCE-11: Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. FORCE11, San Diego, CA, USA (2014)
Freire, J., Bonnet, P., Shasha, D.: Computational reproducibility: state-of-the-art, challenges, and database research opportunities. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, pp. 593–596 (2012). http://doi.acm.org/10.1145/2213836.2213908
Greco, S., SaccĂ , D., Flesca, S., Masciari, E. (eds.): Proceedings of 25th Italian Symposium on Advanced Database Systems (SEBD 2017) (2017)
Groth, P., Gibson, A., Velterop, J.: The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010)
Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.): ECIR 2015. LNCS, vol. 9022. Springer, Cham (2015). doi:10.1007/978-3-319-16354-3
Hanbury, A., MĂĽller, H., Balog, K., Brodt, T., Cormack, G.V., Eggel, I., Gollub, T., Hopfgartner, F., Kalpathy-Cramer, J., Kando, N., Krithara, A., Lin, J., Mercer, S., Potthast, M.: Evaluation-as-a-service: overview and outlook. CoRR abs/1512.07454, December 2015
Harman, D.K.: Information Retrieval Evaluation. Morgan & Claypool Publishers, San Rafael (2011)
Hey, T., Tansley, S., Tolle, K. (eds.): The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond (2009)
Huang, Y.H., Rose, P.W., Hsu, C.N.: Citing a data repository: a case study of the protein data bank. PLoS ONE 10(8), e0136631 (2015)
Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hall, M., Hanbury, A., Toms, E. (eds.): CLEF 2014. LNCS, vol. 8685. Springer, Cham (2014). doi:10.1007/978-3-319-11382-1
Klump, J., Huber, R., Diepenbroek, M.: DOI for geoscience data - how early practices shape present perceptions. Earth Sci. Inform. 1–14 (2015). http://dx.doi.org/10.1007/s12145-015-0231-5
Lipani, A., Piroi, F., Andersson, L., Hanbury, A.: An Information Retrieval Ontology for Information Retrieval Nanopublications. In: Kanoulas et al. [46], pp. 44–49
Papavasileiou, V., Flouris, G., Fundulaki, I., Kotzinos, D., Christophides, V.: High-level change detection in RDF(S) KBs. ACM Trans. Database Syst. 38(1), 1 (2013)
Potthast, M., Gollub, T., Rangel Pardo, F., Rosso, P., Stamatatos, E., Stein, B.: Improving the reproducibility of PAN’s shared tasks: plagiarism detection, author identification, and author profiling. In: Kanoulas et al. [46], pp. 268–299
Pröll, S., Rauber, A.: Scalable data citation in dynamic, large databases: model and reference implementation. In: Hu, X., Young, T.L., Raghavan, V., Wah, B.W., Baeza-Yates, R., Fox, G., Shahabi, C., Smith, M., Yang, Q., Ghani, R., Fan, W., Lempel, R., Nambiar, R. (eds.) Proceedings of the 2013 IEEE International Conference on Big Data, pp. 307–312. IEEE (2013)
Pröll, S., Rauber, A.: Asking the right questions - query-based data citation to precisely identify subsets of data. ERCIM News 100 (2015)
Robinson-Garcia, N., Jiménez-Contreras, E., Torres-Salinas, D.: Analyzing data citation practices according to the data citation index. J. Am. Soc. Inf. Sci. Technol. (JASIST) 67, 2964–2975 (2015)
Silvello, G.: A methodology for citing linked open data subsets. D-Lib Magaz. 21(1/2) (2015). http://dx.doi.org/10.1045/january2015-silvello
Silvello, G.: Learning to cite framework: how to automatically construct citations for hierarchical data. J. Am. Soc. Inf. Sci. Technol. (JASIST), 1–28 (2017)
Silvello, G., Bordea, G., Ferro, N., Buitelaar, P., Bogers, T.: Semantic representation and enrichment of information retrieval experimental data. Int. J. Digit. Libr. (IJDL) 18(2), 145–172 (2017)
Silvello, G., Ferro, N.: Data citation is coming. Introduction to the special issue on data citation. Bullet. IEEE Tech. Committee Digit. Libr. (IEEE-TCDL) 12(1), 1–5 (2016)
Simons, N.: Implementing DOIs for research data. D-Lib Magaz. 18(5/6) (2012). http://dx.doi.org/10.1045/may2012-simons
Vernooy-Gerritsen, M.: Enhanced Publications: Linking Publications and Research Data in Digital Repositories. Amsterdam University Press, Amsterdam (2009)
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. Inf. Process. Manage. 36(5), 697–716 (2000)
Voorhees, E.M., Rajput, S., Soboroff, I.: Promoting repeatability through open runs. In: Yilmaz, E., Clarke, C.L.A. (eds.) Proceedings of 7th International Workshop on Evaluating Information Access (EVIA 2016), pp. 17–20. National Institute of Informatics, Tokyo, Japan (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ferro, N., Silvello, G. (2017). The Road Towards Reproducibility in Science: The Case of Data Citation. In: Grana, C., Baraldi, L. (eds) Digital Libraries and Archives. IRCDL 2017. Communications in Computer and Information Science, vol 733. Springer, Cham. https://doi.org/10.1007/978-3-319-68130-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-68130-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68129-0
Online ISBN: 978-3-319-68130-6
eBook Packages: Computer ScienceComputer Science (R0)