skip to main content
10.1145/3569951.3597572acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
short-paper

Airavata Data Catalog: A Multi-tenant Metadata Service for Efficient Data Discovery and Access Control

Published:10 September 2023Publication History

ABSTRACT

Metadata catalogs are essential for enabling researchers to find and access relevant datasets. However, existing metadata catalog solutions have limitations, such as being domain-specific or using document-oriented databases that limit their scalability and flexibility. To address these issues, we introduce the Airavata Data Catalog - a multi-tenanted schema-free metadata catalog service that supports multiple domain-specific metadata schemas and access control mechanisms. This paper describes our approach to modeling the relational attributes of data products and their access control while supporting a schema-free, document-oriented approach to storing and searching metadata. Our approach offers significant improvements over existing solutions and demonstrates the feasibility of a scalable, flexible metadata catalog for scientific datasets.

References

  1. Apache Airavata. 2023. Airavata Data Catalog. Retrieved June 15, 2023 from https://github.com/apache/airavata-data-catalogGoogle ScholarGoogle Scholar
  2. Edmon Begoli, Jesús Camacho-Rodríguez, Julian Hyde, Michael J. Mior, and Daniel Lemire. 2018. Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD ’18). Association for Computing Machinery, New York, NY, USA, 221–230. https://doi.org/10.1145/3183713.3190662Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Rachid Belaid. 2015. Postgres full-text search is Good Enough!Retrieved April 19, 2023 from https://rachbelaid.com/postgres-full-text-search-is-good-enough/Google ScholarGoogle Scholar
  4. Christopher R. Benson, Laura Kacenauskaite, Katherine L. VanDenburgh, Wei Zhao, Bo Qiao, Tumpa Sadhukhan, Maren Pink, Junsheng Chen, Sina Borgi, Chun-Hsing Chen, Brad J. Davis, Yoan C. Simon, Krishnan Raghavachari, Bo W. Laursen, and Amar H. Flood. 2020. Plug-and-Play Optical Materials from Fluorescent Dyes and Macrocycles. Chem 6, 8 (2020), 1978–1997. https://doi.org/10.1016/j.chempr.2020.06.029Google ScholarGoogle ScholarCross RefCross Ref
  5. Cloud Native Computing Foundation. 2023. gRPC. Retrieved January 26, 2023 from https://grpc.io/Google ScholarGoogle Scholar
  6. The PostgreSQL Global Development Group. 2023. JSON Types: jsonpath Type. Retrieved April 19, 2023 from https://www.postgresql.org/docs/current/datatype-json.html#DATATYPE-JSONPATHGoogle ScholarGoogle Scholar
  7. Scott Jensen and Beth Plale. 2008. Using Characteristics of Computational Science Schemas for Workflow Metadata Management. In Proceedings of the 2008 IEEE Congress on Services - Part I(SERVICES ’08). IEEE Computer Society, USA, 445–452. https://doi.org/10.1109/SERVICES-1.2008.42Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Luigi Marini, Indira Gutierrez-Polo, Rob Kooper, Sandeep Puthanveetil Satheesan, Maxwell Burnette, Jong Lee, Todd Nicholson, Yan Zhao, and Kenton McHenry. 2018. Clowder: Open Source Data Management for Long Tail Data. In Proceedings of the Practice and Experience on Advanced Research Computing (Pittsburgh, PA, USA) (PEARC ’18). Association for Computing Machinery, New York, NY, USA, Article 40, 8 pages. https://doi.org/10.1145/3219104.3219159Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Suresh Marru, Lahiru Gunathilake, Chathura Herath, Patanachai Tangchaisin, Marlon Pierce, Chris Mattmann, Raminder Singh, Thilina Gunarathne, Eran Chinthaka, Ross Gardler, 2011. Apache airavata: a framework for distributed applications and computational workflows. In Proceedings of the 2011 ACM workshop on Gateway computing environments. Association for Computing Machinery, New York, NY, USA, 21–28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Marjan Mernik, Jan Heering, and Anthony M Sloane. 2005. When and how to develop domain-specific languages. ACM computing surveys (CSUR) 37, 4 (2005), 316–344.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. MOLSSI. 2023. QCSchema. Retrieved April 19, 2023 from https://github.com/MolSSI/QCSchemaGoogle ScholarGoogle Scholar
  12. Supun Nakandala, Sudhakar Pamidighantam, Suresh Marru, and Marlon Pierce. 2017. Better Data Discoverability in Science Gateways. PUBART (2017). https://doi.org/10.6084/m9.figshare.4490723.v2Google ScholarGoogle Scholar
  13. Sudhakar Pamidighantam, Supun Nakandala, Eroma Abeysinghe, Chathuri Wimalasena, Shameera Yodage, Suresh Marru, and Marlon Pierce. 2016. Community Science Exemplars in SEAGrid Science Gateway: Apache Airavata Based Implementation of Advanced Infrastructure. Procedia Computer Science 80 (2016), 1927–1939. International Conference on Computational Science 2016, 6-8 June 2016, San Diego, California, USA.Google ScholarGoogle Scholar
  14. Isuru Ranawaka, Suresh Marru, Juleen Graham, Aarushi Bisht, Jim Basney, Terry Fleury, Jeff Gaynor, Dimuthu Wannipurage, Marcus Christie, Alexandru Mahmoud, Enis Afgan, and Marlon Pierce. 2020. Custos: Security Middleware for Science Gateways. In Practice and Experience in Advanced Research Computing (Portland, OR, USA) (PEARC ’20). Association for Computing Machinery, New York, NY, USA, 278–284. https://doi.org/10.1145/3311790.3396635Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Diogo Rodrigues, Mariana Almeida, Pedro Guimarães, and Maribel Yasmina Santos. 2022. DataHub and Apache Atlas: A Comparative Analysis of Data Catalog Tools. CAPSI 2022 Proceedings (2022).Google ScholarGoogle Scholar

Index Terms

  1. Airavata Data Catalog: A Multi-tenant Metadata Service for Efficient Data Discovery and Access Control

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PEARC '23: Practice and Experience in Advanced Research Computing
          July 2023
          519 pages
          ISBN:9781450399852
          DOI:10.1145/3569951

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 September 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate133of202submissions,66%

          Upcoming Conference

          PEARC '24
        • Article Metrics

          • Downloads (Last 12 months)60
          • Downloads (Last 6 weeks)9

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format