skip to main content
10.1145/3311790.3396660acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article
Public Access

An Open Ecosystem for Pervasive Use of Persistent Identifiers

Published: 26 July 2020 Publication History

Abstract

Persistent identifiers (PIDs) are essential for making data Findable, Accessible, Interoperable, and Reusable, or FAIR. While the advantages of PIDs for data publication and citation are well understood, and Digital Object Identifiers (DOIs) are increasingly applied to data, there are two gaps in the current identifier ecosystem: 1) services that provide a consistent baseline of capabilities encompassing key aspects of the research data lifecycle, including canonical landing pages and machine-readable metadata via the same URL; and 2) support for identifiers to be applied to ephemeral data, particularly as data move across system boundaries, such as during workflows. To address these gaps, we have implemented the FAIR Research Identifiers service. This service supports multiple identifier providers (ARK, Handle, DOIs via DataCite, etc.) and uses Globus Auth to implement a rich user- and group-based authorization model for identifier creation. This paper summarizes the current identifier ecosystem, presents best-practices recommendations for identifier use, and describes our FAIR Research Identifiers service.

Supplemental Material

MP4 File
Presentation video

References

[1]
Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, and Philip E. Bourne. 2000. The Protein Data Bank. Nucleic Acids Research 28, 1 (01 2000), 235–242. https://doi.org/10.1093/nar/28.1.235
[2]
Mark Birbeck and Shane McCarron. 2009. CURIE Syntax 1.0–a syntax for expressing compact URIs. World Wide Web Consortium(2009).
[3]
Ben Blaiszik, Kyle Chard, Jim Pruyne, Rachana Ananthankrishnan, Steven Tuecke, and Ian Foster. 2016. The Materials Data Facility: Data Services to Advance Materials Science Research. JOM 68(2016), 2045–2052. https://doi.org/10.1007/s11837-016-2001-3
[4]
Jan Brase. 2009. DataCite–A global registration agency for research data. In 4th International Conference on Cooperation and Promotion of Information Resources in Science and Technology. IEEE, 257–261. https://doi.org/10.1109/COINFO.2009.66
[5]
Dan Brickley, Matthew Burgess, and Natasha Noy. 2019. Google Dataset Search: Building a search engine for datasets in an open Web ecosystem. In The World Wide Web Conference. 1365–1375. https://doi.org/10.1145/3308558.3313685
[6]
Kyle Chard, Mike D’Arcy, Ben Heavner, Ian Foster, Carl Kesselman, Ravi Madduri, Alexis Rodriguez, Stian Soiland-Reyes, Carole Goble, Kristi Clark, 2016. I’ll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets. In International Conference on Big Data. IEEE, 319–328. https://doi.org/10.1109/BigData.2016.7840618
[7]
Kyle Chard, Mattias Lidman, Brendan McCollam, Josh Bryan, Rachana Ananthakrishnan, Steven Tuecke, and Ian Foster. 2016. Globus Nexus: A Platform-as-a-Service provider of research identity, profile, and group management. Future Generation Computer Systems 56 (2016), 571 – 583. https://doi.org/10.1016/j.future.2015.09.006
[8]
K. Chard, J. Pruyne, B. Blaiszik, R. Ananthakrishnan, S. Tuecke, and I. Foster. 2015. Globus Data Publication as a Service: Lowering Barriers to Reproducible Science. In 11th IEEE International Conference on e-Science. 401–410. https://doi.org/10.1109/eScience.2015.68
[9]
Corporation for National Research Initiatives. The Handle System. www.handle.net.
[10]
CrossRef. About us. https://www.crossref.org/about/.
[11]
Data Citation Synthesis Group. 2014. Joint Declaration of Data Citation Principles. (2014). https://doi.org/10.25490/A97F-EGYK
[12]
DataCite. DataCite Metadata Schema. https://schema.datacite.org.
[13]
Digital Preservation Coalition. Persistent identifiers. Digital Preservation Handbook. https://www.dpconline.org/handbook/technical-solutions-and-tools/persistent-identifiers.
[14]
Martin Fenner, Mercè Crosas, Jeffrey S Grethe, David Kennedy, Henning Hermjakob, Phillippe Rocca-Serra, Gustavo Durand, Robin Berjon, Sebastian Karcher, Maryann Martone, and Tim Clark. 2019. A data citation roadmap for scholarly data repositories. Scientific Data 6, 1 (2019), 1–9. https://doi.org/10.1038/s41597-019-0031-8
[15]
figshare. figshare - About. https://figshare.com/about.
[16]
Ramanathan V Guha, Dan Brickley, and Steve Macbeth. 2016. Schema.org: evolution of structured data on the web. Commun. ACM 59, 2 (2016), 44–51. https://doi.org/10.1145/2857274.2857276
[17]
Laurel L Haak, Martin Fenner, Laura Paglione, Ed Pentz, and Howard Ratner. 2012. ORCID: a system to uniquely identify researchers. Learned Publishing 25, 4 (2012), 259–264. https://doi.org/10.1087/20120404
[18]
Nick Juty, Nicolas Le Novere, and Camille Laibe. 2012. Identifiers.org and MIRIAM Registry: Community resources to provide persistent identification. Nucleic Acids Research 40, D1 (2012), D580–D586. https://doi.org/10.1093/nar/gkr1097
[19]
Nick Juty, Sarala M Wimalaratne, Stian Soiland-Reyes, John Kunze, Carole A Goble, and Tim Clark. 2020. Unique, persistent, resolvable: Identifiers as the foundation of FAIR. Data Intelligence (2020), 30–39. https://doi.org/10.1162/dint_a_00025
[20]
Rajkumar Kettimuthu, Zhengchun Liu, David Wheeler, Ian Foster, Katrin Heitmann, and Franck Cappello. 2018. Transferring a Petabyte in a Day. Future Generation Computer Systems 88 (2018), 191–198. https://doi.org/10.1016/j.future.2018.05.051
[21]
John Kunze. The Entity (N2T) Resolver: low-risk, low-cost persistent identification. https://hdl.handle.net/1813/3688.
[22]
John Kunze, Greg Janée, and Joan Starr. 2015. EZID: Easy identifier and metadata management. In 2015 International Conference on Dublin Core and Metadata Applications. 190–191. https://doi.org//doi.org/10.5555/2907896.2907915
[23]
John A. Kunze and Emmanuelle Bermès. 2019. The ARK Identifier Scheme. Internet-Draft draft-kunze-ark-23. Internet Engineering Task Force. https://datatracker.ietf.org/doc/html/draft-kunze-ark-23 Work in Progress.
[24]
John A. Kunze, Justin Littman, Liz Madden, John Scancella, and Chris Adams. The BagIt File Packaging Format (V1.0). RFC 8493. https://doi.org/10.17487/RFC8493
[25]
Dong Joon Lee and Besiki Stvilia. 2014. Developing a data identifier taxonomy. Cataloging & Classification Quarterly 52, 3 (2014), 303–336. https://doi.org/10.1080/01639374.2014.880166
[26]
Fadi Maali, John Erickson, and Phil Archer. 2014. Data Catalog Vocabulary (DCAT). Recommendation. World Wide Web Consortium.
[27]
Julie A McMurry, Nick Juty, Niklas Blomberg, Tony Burdett, Tom Conlin, Nathalie Conte, Melanie Courtot, John Deck, Michel Dumontier, Donal K Fellows, Alejandra Gonzalez-Beltran, Philipp Gormanns, Jeffrey Grethe, Janna Hastings, Jean-Karim Hériché, Henning Hermjakob, Jon C. Ison, Rafael C. Jimenez, Simon Jupp, John Kunze, Camille Laibe, Nicolas Le Novère, Malone James, Maria Jesus Martin, Johanna R. McEntyre, Chris Morris, Juha Muilu, Wolfgang Muller, Philippe Rocca-Serra, Susanna-Assunta Sansone, Murat Sariyar, Jacky L. Snoep, Stian Soiland-Reyes, Natalie J. Stanford, Neil Swainston, Nicole Washington, Alan R. Williams, Sarala M. Wimalaratne, Lilly M. Winfree, Katherine Wolstencroft, Carole Goble, Christopher J. Mungall, Melissa A. Haendel, and Helen Parkinson. 2017. Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data. PLoS biology 15, 6 (2017). https://doi.org/10.1371/journal.pbio.2001414
[28]
Lars Holm Nielsen. Sharing your data and software on Zenodo. https://doi.org/10.5281/zenodo.802100
[29]
Norman Paskin. 2010. Digital object identifier (DOI) system. Encyclopedia of Library and Information Sciences 3 (2010), 1586–1592. https://doi.org/10.1081/E-ELIS3-120044418
[30]
Joan Starr, Eleni Castro, Mercè Crosas, Michel Dumontier, Robert R. Downs, Ruth Duerr, Laurel L. Haak, Melissa Haendel, Ivan Herman, Simon Hodson, Joe Hourclé, John Ernest Kratz, Jennifer Lin, Lars Holm Nielsen, Amy Nurnberger, Stefan Proell, Andreas Rauber, Simone Sacchi, Arthur Smith, Mike Taylor, and Tim Clark. 2015. Achieving human and machine accessibility of cited data in scholarly publications. PeerJ Computer Science 1, e1 (2015). https://doi.org/10.7717/peerj-cs.1
[31]
S. Tuecke, R. Ananthakrishnan, K. Chard, M. Lidman, B. McCollam, S. Rosen, and I. Foster. 2016. Globus Auth: A research identity and access management platform. In 12th IEEE International Conference on e-Science. 203–212. https://doi.org/10.1109/eScience.2016.7870901
[32]
Mark D Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E Bourne, 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data 3(2016). https://doi.org/10.1038/sdata.2016.18

Cited By

View all
  • (2024)The FAIRification process for data stewardship: A comprehensive discourse on the implementation of the FAIR principles for data visibility, interoperability and managementIFLA Journal10.1177/03400352241270692Online publication date: 3-Sep-2024
  • (2024)Design Thinking for Human Centric Research Data Systems EngineeringPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670542(1-7)Online publication date: 17-Jul-2024
  • (2023)Utilizzo del DOI (Digital Object Identifier) per la diffusione di progetti lessicografici digitaliDILEF. Rivista digitale del Dipartimento di Lettere e Filosofia10.35948/DILEF/2024.4327(275-291)Online publication date: 31-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '20: Practice and Experience in Advanced Research Computing 2020: Catch the Wave
July 2020
556 pages
ISBN:9781450366892
DOI:10.1145/3311790
© 2020 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. datasets
  2. identifiers

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • NIH Common Fund
  • U.S. Department of Energy

Conference

PEARC '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)205
  • Downloads (Last 6 weeks)20
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)The FAIRification process for data stewardship: A comprehensive discourse on the implementation of the FAIR principles for data visibility, interoperability and managementIFLA Journal10.1177/03400352241270692Online publication date: 3-Sep-2024
  • (2024)Design Thinking for Human Centric Research Data Systems EngineeringPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670542(1-7)Online publication date: 17-Jul-2024
  • (2023)Utilizzo del DOI (Digital Object Identifier) per la diffusione di progetti lessicografici digitaliDILEF. Rivista digitale del Dipartimento di Lettere e Filosofia10.35948/DILEF/2024.4327(275-291)Online publication date: 31-Dec-2023
  • (2023)Utilizzo del DOI (Digital Object Identifier) per la diffusione di progetti lessicografici digitaliDILEF. Rivista digitale del Dipartimento di Lettere e Filosofia10.35948/DILEF/2023.4327(1-17)Online publication date: 31-Dec-2023
  • (2023)Measuring the Concept of PID Literacy: User Perceptions and Understanding of PIDs in Support of Open Scholarly InfrastructureOpen Information Science10.1515/opis-2022-01427:1Online publication date: 14-Mar-2023
  • (2023)Linking the Dynamic PicoProbe Analytical Electron-Optical Beam Line / Microscope to SupercomputersProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624614(2140-2146)Online publication date: 12-Nov-2023
  • (2023)Globus automation servicesFuture Generation Computer Systems10.1016/j.future.2023.01.010142:C(393-409)Online publication date: 1-May-2023
  • (undefined)Linking Scientific Instruments and HPC: Patterns, Technologies, ExperiencesSSRN Electronic Journal10.2139/ssrn.4141629

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media