Skip to main content

Midas: Towards an Interactive Data Catalog

  • Conference paper
  • First Online:
Heterogeneous Data Management, Polystores, and Analytics for Healthcare (DMAH 2019, Poly 2019)

Abstract

This paper presents the ongoing work on the Midas polystore system. The system combines data cataloging features with ad-hoc query capabilities and is specifically tailored to support agile data science teams that have to handle large datasets in a heterogeneous data landscape. Midas consists of a distributed SQL-based query engine and a web application for managing and virtualizing datasets. It differs from prior systems in its ability to provide attribute level lineage using graph-based virtualization, sophisticated metadata management, and query offloading on virtualized datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache arrow homepage. https://arrow.apache.org/. Accessed 15 Mar 2019

  2. Dremio is the data-as-a-service platform. - dremio. https://www.dremio.com/. Accessed 15 Dec 2018

  3. Aggarwal, C.C.: Trio a system for data uncertainty and lineage. In: Aggarwal, C. (ed.) Managing and Mining Uncertain Data, vol. 35, pp. 1–35. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09690-2_5

    Chapter  MATH  Google Scholar 

  4. Cui, Y., Widom, J.: Lineage tracing for general data warehouse transformations. VLDB J. Int. J. Very Large Data Bases 12(1), 41–58 (2003)

    Article  Google Scholar 

  5. Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. ACM SIGMOD Rec. 34(4), 27–33 (2005)

    Article  Google Scholar 

  6. Halevy, A., et al.: Goods: organizing Google’s datasets. In: Proceedings of the 2016 International Conference on Management of Data, pp. 795–806. ACM (2016)

    Google Scholar 

  7. Hausenblas, M., Nadeau, J.: Apache drill: interactive ad-hoc analysis at scale. Big Data 1(2), 100–104 (2013)

    Article  Google Scholar 

  8. Kong, W., Li, R., Luo, J., Zhang, A., Chang, Y., Allan, J.: Predicting search intent based on pre-search context. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 503–512. ACM (2015)

    Google Scholar 

  9. Lenzerini, M.: Ontology-based data management. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 5–6. ACM (2011)

    Google Scholar 

  10. Melnik, S., et al.: Dremel: interactive analysis of web-scale datasets. Proc. VLDB Endow. 3(1–2), 330–339 (2010)

    Article  Google Scholar 

  11. Tenopir, C., et al.: Data sharing by scientists: practices and perceptions. PLoS ONE 6(6), e21101 (2011)

    Article  Google Scholar 

  12. Traverso, M.: Presto: Interacting with petabytes of data at facebook (2013). Accessed 4 Feb 2014

    Google Scholar 

  13. Woodruff, A., Stonebraker, M.: Supporting fine-grained data lineage in a database visualization environment. In: Proceedings 13th International Conference on Data Engineering, pp. 91–102. IEEE (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Holl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Holl, P., Gossling, K. (2019). Midas: Towards an Interactive Data Catalog. In: Gadepally, V., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2019 2019. Lecture Notes in Computer Science(), vol 11721. Springer, Cham. https://doi.org/10.1007/978-3-030-33752-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33752-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33751-3

  • Online ISBN: 978-3-030-33752-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics