ABSTRACT
Publishing and sharing data is critical to fostering collaboration and advancing scientific research. Data portals are commonly used to organize, publish, and securely disseminate data—a critical step toward making data findable, accessible, interoperable, and reusable (FAIR). However, the diversity of scientific data types, sizes, and their location present significant challenges, e.g., it is difficult for portals to accommodate heterogenous research products when using strict metadata schemas and rigid interfaces. Thus, there is a need for a user-customizable data portal solution that enables rapid creation of new portals that may be tailored to a researchers needs while accommodating distributed data sources and engaging advanced computing resources. In this paper, we present the Django Globus Portal Framework (DGPF), a tool designed to help users rapidly create secure, customizable, and extensible data portals. DGPF is a powerful and flexible framework that builds upon the Globus platform for authentication, data sharing, creation of automation flows, and search capabilities, allowing for seamless integration with existing research workflows. We present the design and implementation of the DGPF and describe our experiences operating the Argonne Community Data Co-op (ACDC)—a collection of DGPF portals with over 1 M records and over 100 TB of published data that has been accessed by more than 300 users.
- Rachana Ananthakrishnan, Ben Blaiszik, Kyle Chard, Ryan Chard, Brendan McCollam, Jim Pruyne, Stephen Rosen, Steven Tuecke, and Ian Foster. 2018. Globus Platform Services for Data Publication. In Practice and Experience on Advanced Research Computing (Pittsburgh, PA, USA) (PEARC ’18). ACM, New York, NY, USA, Article 14, 7 pages.Google Scholar
- Rachana Ananthakrishnan, Kyle Chard, Ian Foster, and Steven Tuecke. 2015. Globus platform-as-a-service for collaborative science applications. Concurrency and Computation: Practice and Experience 27, 2 (2015), 290–305.Google ScholarCross Ref
- Python Social Auth. 2023. Python Social Auth. Retrieved March 2, 2023 from https://python-social-auth.readthedocs.io/en/latest/Google Scholar
- B. Blaiszik, K. Chard, J. Pruyne, R. Ananthakrishnan, S. Tuecke, and I. Foster. 2016. The Materials Data Facility: Data Services to Advance Materials Science Research. JOM 68, 8 (July 2016), 2045–2052. https://doi.org/10.1007/s11837-016-2001-3Google ScholarCross Ref
- Ben Blaiszik, Logan Ward, Marcus Schwarting, Jonathon Gaff, Ryan Chard, Daniel Pike, Kyle Chard, and Ian Foster. 2019. A data ecosystem to support machine learning in materials science. MRS Communications 9, 4 (2019), 1125–1133.Google ScholarCross Ref
- James F Brinkley, Shannon Fisher, Matthew P Harris, Greg Holmes, Joan E Hooper, Ethylin Wang Jabs, Kenneth L Jones, Carl Kesselman, Ophir D Klein, Richard L Maas, 2016. The FaceBase Consortium: a comprehensive resource for craniofacial researchers. Development 143, 14 (2016), 2677–2688.Google ScholarCross Ref
- Amanda L Charbonneau, Arthur Brady, Karl Czajkowski, Jain Aluvathingal, Saranya Canchi, Robert Carter, Kyle Chard, Daniel JB Clarke, Jonathan Crabtree, Heather H Creasy, 2022. Making Common Fund data more findable: catalyzing a data ecosystem. GigaScience 11 (2022).Google Scholar
- Kyle Chard, Eli Dart, Ian Foster, David Shifflett, Steven Tuecke, and Jason Williams. 2018. The Modern Research Data Portal: A design pattern for networked, data-intensive science. PeerJ Computer Science 4 (2018), e144.Google ScholarCross Ref
- Kyle Chard, Mattias Lidman, Brendan McCollam, Josh Bryan, Rachana Ananthakrishnan, Steven Tuecke, and Ian Foster. 2016. Globus Nexus: A Platform-as-a-Service provider of research identity, profile, and group management. Future Generation Computer Systems 56 (2016), 571–583. https://doi.org/10.1016/j.future.2015.09.006Google ScholarDigital Library
- Kyle Chard, Jim Pruyne, Ben Blaiszik, Rachana Ananthakrishnan, Steven Tuecke, and Ian Foster. 2015. Globus data publication as a service: Lowering barriers to reproducible science. In 2015 IEEE 11th International Conference on e-Science. IEEE, 401–410.Google ScholarDigital Library
- K. Chard, S. Tuecke, and I. Foster. 2014. Efficient and Secure Transfer, Synchronization, and Sharing of Big Data. IEEE Cloud Computing 1, 3 (2014), 46–55.Google ScholarCross Ref
- Ryan Chard, Jim Pruyne, Kurt McKee, Josh Bryan, Brigitte Raumann, Rachana Ananthakrishnan, Kyle Chard, and Ian T Foster. 2023. Globus automation services: Research process automation across the space–time continuum. Future Generation Computer Systems (2023).Google Scholar
- LSST Dark Energy Science Collaboration. 2023. LSSTDESC Data Portal. Retrieved March 2, 2023 from https://data.lsstdesc.org/Google Scholar
- Django Globus App Cookiecutter. 2023. Django Globus App Cookiecutter. Retrieved March 2, 2023 from https://github.com/globus/cookiecutter-django-globus-appGoogle Scholar
- Mercè Crosas. 2011. The dataverse network: an open-source application for sharing, discovering and preserving data. D-lib Magazine 17, 1/2 (2011).Google ScholarCross Ref
- Django Globus Portal Framework Documentation. 2023. Django Globus Portal Framework Documentation. Retrieved March 2, 2023 from https://django-globus-portal-framework.readthedocs.io/Google Scholar
- European Organization For Nuclear Research and OpenAIRE. 2013. Zenodo. https://doi.org/10.25495/7GXK-RD71Google Scholar
- Django Software Foundation. 2023. Object Relational Mappers. Retrieved June 9, 2023 from https://docs.djangoproject.com/en/4.2/topics/db/models/Google Scholar
- Jeremy Goecks, Anton Nekrutenko, James Taylor, and Galaxy Team team@ galaxyproject. org. 2010. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome biology 11 (2010), 1–13.Google Scholar
- Clinton Gormley and Zachary Tong. 2015. ElasticSearch: The definitive guide: a distributed real-time search and analytics engine. O’Reilly Media, Inc.Google Scholar
- Dick Hardt. 2012. The OAuth 2.0 authorization framework. Technical Report.Google Scholar
- Katrin Heitmann, Thomas D Uram, Hal Finkel, Nicholas Frontiere, Salman Habib, Adrian Pope, Esteban Rangel, Joseph Hollowed, Danila Korytov, Patricia Larsen, 2019. Hacc cosmological simulations: First data release. The Astrophysical Journal Supplement Series 244, 1 (2019), 17.Google ScholarCross Ref
- Faisal Khan, Suresh Narayanan, Roger Sersted, Nicholas Schwarz, and Alec Sandy. 2018. Distributed X-ray photon correlation spectroscopy data reduction using Hadoop MapReduce. Journal of Synchrotron Radiation 25, 4 (2018), 1135–1143.Google ScholarCross Ref
- Suresh Marru, Lahiru Gunathilake, Chathura Herath, Patanachai Tangchaisin, Marlon Pierce, Chris Mattmann, Raminder Singh, Thilina Gunarathne, Eran Chinthaka, Ross Gardler, 2011. Apache airavata: a framework for distributed applications and computational workflows. In Proceedings of the 2011 ACM workshop on Gateway computing environments. 21–28.Google ScholarDigital Library
- Michael McLennan and Rick Kennell. 2010. HUBzero: a platform for dissemination and collaboration in computational science and engineering. Computing in Science & Engineering 12, 2 (2010), 48–53.Google ScholarDigital Library
- Natsuhiko Sakimura, John Bradley, Mike Jones, Breno De Medeiros, and Chuck Mortimore. 2014. Openid connect core 1.0. The OpenID Foundation (2014), S3.Google Scholar
- Darren A Sherrell, Alex Lavens, Mateusz Wilamowski, Youngchang Kim, Ryan Chard, Krzysztof Lazarski, Gerold Rosenbaum, Rafael Vescovi, Jessica L Johnson, Chase Akins, 2022. Fixed-target serial crystallography at the Structural Biology Center. Journal of Synchrotron Radiation 29, 5 (2022).Google ScholarCross Ref
- Tyler J Skluzacek, Ryan Wong, Zhuozhao Li, Ryan Chard, Kyle Chard, and Ian Foster. 2021. A serverless framework for distributed bulk metadata extraction. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing. 7–18.Google ScholarDigital Library
- Joe Stubbs, Richard Cardone, Mike Packard, Anagha Jamthe, Smruti Padhy, Steve Terry, Julia Looney, Joseph Meiring, Steve Black, Maytal Dahan, 2021. Tapis: an API platform for reproducible, distributed computational research. In Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 1. Springer, 878–900.Google ScholarCross Ref
- The Globus Team. 2023. Django Globus Portal Framework Github. Retrieved March 2, 2023 from https://github.com/globus/django-globus-portal-frameworkGoogle Scholar
- Steven Tuecke, Rachana Ananthakrishnan, Kyle Chard, Mattias Lidman, Brendan McCollam, Stephen Rosen, and Ian Foster. 2016. Globus Auth: A research identity and access management platform. In IEEE 12th International Conference on e-Science (e-Science). IEEE, 203–212.Google ScholarCross Ref
- Rafael Vescovi, Ryan Chard, Nickolaus Saint, Ben Blaiszik, Jim Pruyne, Tekin Bicer, Alex Lavens, Zhengchun Liu, Michael E. Papka, Suresh Narayanan, Nicholas Schwarz, Kyle Chard, and Ian Foster. 2022. Linking Instruments and HPC: Patterns, Technologies, Experiences. Arxiv.Google Scholar
- Siniša Veseli, Nicholas Schwarz, and Collin Schmitz. 2018. APS data management system. Journal of Synchrotron Radiation 25, 5 (2018), 1574–1580.Google ScholarCross Ref
Recommendations
The astrophysics simulation collaboratory portal: a framework for effective distributed research
Special issue: Advanced grid technologiesWe describe the Astrophysics Simulation Collaboratory (ASC) Portal, a collaborative environment in which distributed projects can perform research. The ASC project seeks to provide a web-based problem solving framework for the astrophysics community to ...
Globus Data Publication as a Service: Lowering Barriers to Reproducible Science
E-SCIENCE '15: Proceedings of the 2015 IEEE 11th International Conference on e-ScienceBroad access to the data on which scientific results are based is essential for verification, reproducibility, and extension. Scholarly publication has long been the means to this end. But as data volumes grow, new methods beyond traditional ...
The development of a geospatial data Grid by integrating OGC Web services with Globus-based Grid technology
Grids and Geospatial Information SystemsGeospatial science is the science and art of acquiring, archiving, manipulating, analyzing, communicating, modeling with, and utilizing spatially explicit data for understanding physical, chemical, biological, and social systems on the Earth's surface ...
Comments