skip to main content
10.1145/3569951.3593597acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article

Active Research Data Management with the Django Globus Portal Framework

Published:10 September 2023Publication History

ABSTRACT

Publishing and sharing data is critical to fostering collaboration and advancing scientific research. Data portals are commonly used to organize, publish, and securely disseminate data—a critical step toward making data findable, accessible, interoperable, and reusable (FAIR). However, the diversity of scientific data types, sizes, and their location present significant challenges, e.g., it is difficult for portals to accommodate heterogenous research products when using strict metadata schemas and rigid interfaces. Thus, there is a need for a user-customizable data portal solution that enables rapid creation of new portals that may be tailored to a researchers needs while accommodating distributed data sources and engaging advanced computing resources. In this paper, we present the Django Globus Portal Framework (DGPF), a tool designed to help users rapidly create secure, customizable, and extensible data portals. DGPF is a powerful and flexible framework that builds upon the Globus platform for authentication, data sharing, creation of automation flows, and search capabilities, allowing for seamless integration with existing research workflows. We present the design and implementation of the DGPF and describe our experiences operating the Argonne Community Data Co-op (ACDC)—a collection of DGPF portals with over 1 M records and over 100 TB of published data that has been accessed by more than 300 users.

References

  1. Rachana Ananthakrishnan, Ben Blaiszik, Kyle Chard, Ryan Chard, Brendan McCollam, Jim Pruyne, Stephen Rosen, Steven Tuecke, and Ian Foster. 2018. Globus Platform Services for Data Publication. In Practice and Experience on Advanced Research Computing (Pittsburgh, PA, USA) (PEARC ’18). ACM, New York, NY, USA, Article 14, 7 pages.Google ScholarGoogle Scholar
  2. Rachana Ananthakrishnan, Kyle Chard, Ian Foster, and Steven Tuecke. 2015. Globus platform-as-a-service for collaborative science applications. Concurrency and Computation: Practice and Experience 27, 2 (2015), 290–305.Google ScholarGoogle ScholarCross RefCross Ref
  3. Python Social Auth. 2023. Python Social Auth. Retrieved March 2, 2023 from https://python-social-auth.readthedocs.io/en/latest/Google ScholarGoogle Scholar
  4. B. Blaiszik, K. Chard, J. Pruyne, R. Ananthakrishnan, S. Tuecke, and I. Foster. 2016. The Materials Data Facility: Data Services to Advance Materials Science Research. JOM 68, 8 (July 2016), 2045–2052. https://doi.org/10.1007/s11837-016-2001-3Google ScholarGoogle ScholarCross RefCross Ref
  5. Ben Blaiszik, Logan Ward, Marcus Schwarting, Jonathon Gaff, Ryan Chard, Daniel Pike, Kyle Chard, and Ian Foster. 2019. A data ecosystem to support machine learning in materials science. MRS Communications 9, 4 (2019), 1125–1133.Google ScholarGoogle ScholarCross RefCross Ref
  6. James F Brinkley, Shannon Fisher, Matthew P Harris, Greg Holmes, Joan E Hooper, Ethylin Wang Jabs, Kenneth L Jones, Carl Kesselman, Ophir D Klein, Richard L Maas, 2016. The FaceBase Consortium: a comprehensive resource for craniofacial researchers. Development 143, 14 (2016), 2677–2688.Google ScholarGoogle ScholarCross RefCross Ref
  7. Amanda L Charbonneau, Arthur Brady, Karl Czajkowski, Jain Aluvathingal, Saranya Canchi, Robert Carter, Kyle Chard, Daniel JB Clarke, Jonathan Crabtree, Heather H Creasy, 2022. Making Common Fund data more findable: catalyzing a data ecosystem. GigaScience 11 (2022).Google ScholarGoogle Scholar
  8. Kyle Chard, Eli Dart, Ian Foster, David Shifflett, Steven Tuecke, and Jason Williams. 2018. The Modern Research Data Portal: A design pattern for networked, data-intensive science. PeerJ Computer Science 4 (2018), e144.Google ScholarGoogle ScholarCross RefCross Ref
  9. Kyle Chard, Mattias Lidman, Brendan McCollam, Josh Bryan, Rachana Ananthakrishnan, Steven Tuecke, and Ian Foster. 2016. Globus Nexus: A Platform-as-a-Service provider of research identity, profile, and group management. Future Generation Computer Systems 56 (2016), 571–583. https://doi.org/10.1016/j.future.2015.09.006Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kyle Chard, Jim Pruyne, Ben Blaiszik, Rachana Ananthakrishnan, Steven Tuecke, and Ian Foster. 2015. Globus data publication as a service: Lowering barriers to reproducible science. In 2015 IEEE 11th International Conference on e-Science. IEEE, 401–410.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Chard, S. Tuecke, and I. Foster. 2014. Efficient and Secure Transfer, Synchronization, and Sharing of Big Data. IEEE Cloud Computing 1, 3 (2014), 46–55.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ryan Chard, Jim Pruyne, Kurt McKee, Josh Bryan, Brigitte Raumann, Rachana Ananthakrishnan, Kyle Chard, and Ian T Foster. 2023. Globus automation services: Research process automation across the space–time continuum. Future Generation Computer Systems (2023).Google ScholarGoogle Scholar
  13. LSST Dark Energy Science Collaboration. 2023. LSSTDESC Data Portal. Retrieved March 2, 2023 from https://data.lsstdesc.org/Google ScholarGoogle Scholar
  14. Django Globus App Cookiecutter. 2023. Django Globus App Cookiecutter. Retrieved March 2, 2023 from https://github.com/globus/cookiecutter-django-globus-appGoogle ScholarGoogle Scholar
  15. Mercè Crosas. 2011. The dataverse network: an open-source application for sharing, discovering and preserving data. D-lib Magazine 17, 1/2 (2011).Google ScholarGoogle ScholarCross RefCross Ref
  16. Django Globus Portal Framework Documentation. 2023. Django Globus Portal Framework Documentation. Retrieved March 2, 2023 from https://django-globus-portal-framework.readthedocs.io/Google ScholarGoogle Scholar
  17. European Organization For Nuclear Research and OpenAIRE. 2013. Zenodo. https://doi.org/10.25495/7GXK-RD71Google ScholarGoogle Scholar
  18. Django Software Foundation. 2023. Object Relational Mappers. Retrieved June 9, 2023 from https://docs.djangoproject.com/en/4.2/topics/db/models/Google ScholarGoogle Scholar
  19. Jeremy Goecks, Anton Nekrutenko, James Taylor, and Galaxy Team team@ galaxyproject. org. 2010. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome biology 11 (2010), 1–13.Google ScholarGoogle Scholar
  20. Clinton Gormley and Zachary Tong. 2015. ElasticSearch: The definitive guide: a distributed real-time search and analytics engine. O’Reilly Media, Inc.Google ScholarGoogle Scholar
  21. Dick Hardt. 2012. The OAuth 2.0 authorization framework. Technical Report.Google ScholarGoogle Scholar
  22. Katrin Heitmann, Thomas D Uram, Hal Finkel, Nicholas Frontiere, Salman Habib, Adrian Pope, Esteban Rangel, Joseph Hollowed, Danila Korytov, Patricia Larsen, 2019. Hacc cosmological simulations: First data release. The Astrophysical Journal Supplement Series 244, 1 (2019), 17.Google ScholarGoogle ScholarCross RefCross Ref
  23. Faisal Khan, Suresh Narayanan, Roger Sersted, Nicholas Schwarz, and Alec Sandy. 2018. Distributed X-ray photon correlation spectroscopy data reduction using Hadoop MapReduce. Journal of Synchrotron Radiation 25, 4 (2018), 1135–1143.Google ScholarGoogle ScholarCross RefCross Ref
  24. Suresh Marru, Lahiru Gunathilake, Chathura Herath, Patanachai Tangchaisin, Marlon Pierce, Chris Mattmann, Raminder Singh, Thilina Gunarathne, Eran Chinthaka, Ross Gardler, 2011. Apache airavata: a framework for distributed applications and computational workflows. In Proceedings of the 2011 ACM workshop on Gateway computing environments. 21–28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Michael McLennan and Rick Kennell. 2010. HUBzero: a platform for dissemination and collaboration in computational science and engineering. Computing in Science & Engineering 12, 2 (2010), 48–53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Natsuhiko Sakimura, John Bradley, Mike Jones, Breno De Medeiros, and Chuck Mortimore. 2014. Openid connect core 1.0. The OpenID Foundation (2014), S3.Google ScholarGoogle Scholar
  27. Darren A Sherrell, Alex Lavens, Mateusz Wilamowski, Youngchang Kim, Ryan Chard, Krzysztof Lazarski, Gerold Rosenbaum, Rafael Vescovi, Jessica L Johnson, Chase Akins, 2022. Fixed-target serial crystallography at the Structural Biology Center. Journal of Synchrotron Radiation 29, 5 (2022).Google ScholarGoogle ScholarCross RefCross Ref
  28. Tyler J Skluzacek, Ryan Wong, Zhuozhao Li, Ryan Chard, Kyle Chard, and Ian Foster. 2021. A serverless framework for distributed bulk metadata extraction. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing. 7–18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Joe Stubbs, Richard Cardone, Mike Packard, Anagha Jamthe, Smruti Padhy, Steve Terry, Julia Looney, Joseph Meiring, Steve Black, Maytal Dahan, 2021. Tapis: an API platform for reproducible, distributed computational research. In Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 1. Springer, 878–900.Google ScholarGoogle ScholarCross RefCross Ref
  30. The Globus Team. 2023. Django Globus Portal Framework Github. Retrieved March 2, 2023 from https://github.com/globus/django-globus-portal-frameworkGoogle ScholarGoogle Scholar
  31. Steven Tuecke, Rachana Ananthakrishnan, Kyle Chard, Mattias Lidman, Brendan McCollam, Stephen Rosen, and Ian Foster. 2016. Globus Auth: A research identity and access management platform. In IEEE 12th International Conference on e-Science (e-Science). IEEE, 203–212.Google ScholarGoogle ScholarCross RefCross Ref
  32. Rafael Vescovi, Ryan Chard, Nickolaus Saint, Ben Blaiszik, Jim Pruyne, Tekin Bicer, Alex Lavens, Zhengchun Liu, Michael E. Papka, Suresh Narayanan, Nicholas Schwarz, Kyle Chard, and Ian Foster. 2022. Linking Instruments and HPC: Patterns, Technologies, Experiences. Arxiv.Google ScholarGoogle Scholar
  33. Siniša Veseli, Nicholas Schwarz, and Collin Schmitz. 2018. APS data management system. Journal of Synchrotron Radiation 25, 5 (2018), 1574–1580.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    PEARC '23: Practice and Experience in Advanced Research Computing
    July 2023
    519 pages
    ISBN:9781450399852
    DOI:10.1145/3569951

    Copyright © 2023 ACM

    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 10 September 2023

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate133of202submissions,66%

    Upcoming Conference

    PEARC '24
  • Article Metrics

    • Downloads (Last 12 months)85
    • Downloads (Last 6 weeks)16

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format