skip to main content
10.1145/3491418.3530297acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article

Scholarly Data Share: A Model for Sharing Big Data in Academic Research

Published: 08 July 2022 Publication History

Abstract

The Scholarly Data Share (SDS) is a lightweight web interface that facilitates access to large, curated research datasets stored in a tape archive. SDS addresses the common needs of research teams working with and managing large and complex datasets, and the associated storage. The service adds several key features to the standard tape storage offerings that are of particular value to the research community: (1) the ability to capture and manage metadata, (2) metadata-driven browsing and retrieval over a web interface, (3) reliable and scalable asynchronous data transfers, and (4) an interface that hides the complexity of the underlying storage and access infrastructure. SDS is designed to be easy to implement and sustain over time by building on existing tool chains and proven open-source software and by minimizing bespoke code and domain-specific customization. In this paper, we describe the development of the SDS and the implementation of an instance to provide access to a large collection of geospatial datasets.

References

[1]
Jim Basney, Heather Flanagan, Terry Fleury, Jeff Gaynor, Scott Koranda, and Benn Oshrin. 2019. CILogon: Enabling Federated Identity and Access Management for Scientific Collaborations. In International Symposium on Grids & Clouds 2019, ISGC2019. Proceedings of Science, 34136 Trieste Italy, 031.
[2]
Stephen D. Bay, Dennis Kibler, Michael J. Pazzani, and Padhraic Smyth. 2000. The UCI KDD Archive of Large Data Sets for Data Mining Research and Experimentation. SIGKDD Explor. Newsl. 2, 2 (dec 2000), 81–85. https://doi.org/10.1145/380995.381030
[3]
Katy Borner, Michael Conlon, Jon Corson-Rikert, and Ying Ding. 2012. VIVO: A Semantic Approach to Scholarly Networking and Discovery. Vol. 7. Morgan & Claypool Publishers, Williston, VT 05495, USA. 1–178 pages. https://doi.org/10.2200/S00428ED1V01Y201207WBE002
[4]
Caltech. 2022. NASA/IPAC Infrared Science Archive. Science & Data Center for Astrophysics & Planetary Sciences. https://irsa.ipac.caltech.edu/about.html
[5]
Scott Cantor, John Kemp, Rob Philpott, and Eve Maler. 2005. Assertions and Protocols for the OASIS Security Assertion Markup Language (SAML) V2.0. http://docs.oasis-open.org/security/saml/v2.0/
[6]
Kyle Chard, Ian Foster, and Steven Tuecke. 2017. Globus: Research Data Management as Service and Platform. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact (New Orleans, LA, USA) (PEARC17). Association for Computing Machinery, New York, NY, USA, Article 26, 5 pages. https://doi.org/10.1145/3093338.3093367
[7]
Jeffrey T. Clark, Brian M. Slator, Aaron Bergstrom, Francis Larson, Richard Frovarp, James E. Landrum, William Perrizo, and William Jockheck. 2002. DANA (Digital Archive Network for Anthropology): A Model for Digital Archiving. In Proceedings of the 2002 ACM Symposium on Applied Computing (Madrid, Spain) (SAC ’02). Association for Computing Machinery, New York, NY, USA, 483–487. https://doi.org/10.1145/508791.508881
[8]
HPSS Collaboration. 2021. HPSS Installation Guide, High Performance Storage System, version 9.3.0.0.0. HPSS Collaboration. Retrieved February 18, 2022 from https://www.hpss-collaboration.org/documents/HPSS_9.3.0_Users_Guide.pdf
[9]
Shibboleth Consortium. 2022. Shibboleth Project. Shibboleth Consortium. Retrieved February 18, 2022 from https://www.shibboleth.net/
[10]
UC San Diego. 2022. Chronopolis — Digital Preservation Across Space & Time. UC San Diego. Retrieved February 14, 2022 from https://library.ucsd.edu/chronopolis/
[11]
Daniel T. Dietz, Lev A. Gorenstein, Gregory S. Veldman, and Kevin D. Colby. 2017. Shared Research Group Storage Solution with Integrated Access Management. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact (New Orleans, LA, USA) (PEARC17). Association for Computing Machinery, New York, NY, USA, Article 14, 7 pages. https://doi.org/10.1145/3093338.3093354
[12]
James Hilton, Tom Cramer, Sebastien Korner, and David Minor. 2013. The Case for Building a Digital Preservation Network. Educause Review. https://er.educause.edu/articles/2013/8/educause-review-print-edition-volume-48-number-4-julyaugust-2013
[13]
Internet2. 2022. InCommon Federation. Internet2. Retrieved February 18, 2022 from https://incommon.org/federation/
[14]
Stacy T Kowalczyk, Yiming Sun, Zong Peng, Beth Plale, Aaron Todd, Loretta Auvil, Craig Willis, Jiaan Zeng, Milinda Pathirage, Samitha Liyanage, 2014. Big data at scale for digital humanities: An architecture for the HathiTrust Research Center. In Big data management, technologies, and applications. IGI Global, Hershey, PA 17033-1240, USA, 270–294.
[15]
Jennifer Moore, Adam Rountrey, and Hannah S. Kettler. 2022. Community Standards for 3D Data Preservation. CS3DP. https://cs3dp.org/
[16]
Dimitar Nikolov and Esen Tuna. 2019. A Lightweight Framework for Research Data Management. In Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning) (Chicago, IL, USA) (PEARC ’19). Association for Computing Machinery, New York, NY, USA, Article 90, 4 pages. https://doi.org/10.1145/3332186.3333157
[17]
Indiana Geographic Information Officer. 2022. Indiana Imagery. Indiana State Geographic Information Office. https://www.in.gov/gis/indiana-imagery/
[18]
General Data Protection Regulation. 2022. Right to be forgotten. GDPR.EU. https://gdpr.eu/right-to-be-forgotten/
[19]
Digital Scholar. 2022. Omeka open-source web publishing platforms for sharing digital collections and creating media-rich online exhibits. Corporation for Digital Scholarship. Retrieved February 18, 2022 from https://omeka.org/
[20]
Roger C. Schonfeld. 2018. Why Is the Digital Preservation Network Disbanding?Society for Scholarly Publishing. https://scholarlykitchen.sspnet.org/2018/12/13/digital-preservation-network-disband/
[21]
Craig A. Stewart, Timothy M. Cockerill, Ian Foster, David Hancock, Nirav Merchant, Edwin Skidmore, Daniel Stanzione, James Taylor, Steven Tuecke, George Turner, Matthew Vaughn, and Niall I. Gaffney. 2015. Jetstream: A Self-Provisioned, Scalable Science and Engineering Cloud Environment. In Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure (St. Louis, Missouri) (XSEDE ’15). Association for Computing Machinery, New York, NY, USA, Article 29, 8 pages. https://doi.org/10.1145/2792745.2792774
[22]
Research Technologies. 2022. Scholarly Data Archive (SDA). Indiana University, UITS. https://kb.iu.edu/d/aiyi
[23]
University Information Technology Services (UITS). 2022. Intelligent Infrastructure (II). Indiana University. https://uits.iu.edu/services/intelligent-infrastructure
[24]
USGS. 2018. Digital Orthophoto Quadrangle (DOQs). U.S. Geological Survey. Retrieved February 18, 2022 from https://doi.org/10.5066/F7125QVD
[25]
Brian Wheeler. 2019. If I Knew Then What I Know Now: Evolution of MDPI’s Post-digitization Processing. https://hdl.handle.net/2022/24911
[26]
Mark D Wilkinson and et al.2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 1 (March 2016), 160018.
[27]
Xiaoran Yan, Guangchen Ruan, Dimitar Nikolov, Matthew Hutchinson, Chathuri Peli Kankanamalage, Ben Serrette, James McCombs, Alan Walsh, Esen Tuna, and Valentin Pentchev. 2021. CADRE: A Cloud-Based Data Service for Big Bibliographic Data. Association for Computing Machinery, New York, NY, USA, 4283–4292. https://doi.org/10.1145/3459637.3481898

Cited By

View all
  • (2023)Scholarly Data Share 2.0: Granular Access to Research DataPractice and Experience in Advanced Research Computing 2023: Computing for the Common Good10.1145/3569951.3597585(177-180)Online publication date: 23-Jul-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '22: Practice and Experience in Advanced Research Computing 2022: Revolutionary: Computing, Connections, You
July 2022
455 pages
ISBN:9781450391610
DOI:10.1145/3491418
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. collections
  2. datasets
  3. geospatial data
  4. metadata
  5. research data sharing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PEARC '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Upcoming Conference

PEARC '25
Practice and Experience in Advanced Research Computing
July 20 - 24, 2025
Columbus , OH , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)2
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Scholarly Data Share 2.0: Granular Access to Research DataPractice and Experience in Advanced Research Computing 2023: Computing for the Common Good10.1145/3569951.3597585(177-180)Online publication date: 23-Jul-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media