Skip to main content

Evolution Management of Multi-model Data

(Position Paper)

  • Conference paper
  • First Online:
Heterogeneous Data Management, Polystores, and Analytics for Healthcare (DMAH 2019, Poly 2019)

Abstract

The variety of data is one of the most challenging issues for the research and practice in data management. The so-called multi-model data are naturally organized in different, but mutually linked formats and models, including structured, semi-structured, and unstructured. In this position paper we discuss the so far neglected, but from the point of view of real-world applications important aspect of evolution management of multi-model data. We provide a motivation scenario and we discuss key related challenges, such as multi-model data modelling, intra vs. inter model changes, global and local evolution operations, eager vs. lazy migration, and schema inference .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that the UPDATE command should be done for all the key/value records.

  2. 2.

    For a concrete implementation, the definitions of identifier and value must still be specified.

References

  1. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2(1), 922–933 (2009)

    Article  Google Scholar 

  2. Akoka, J., Comyn-Wattiau, I., Prat, N.: A four V’s design approach of NoSQL graph databases. In: de Cesare, S., Frank, U. (eds.) ER 2017. LNCS, vol. 10651, pp. 58–68. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70625-2_6

    Chapter  Google Scholar 

  3. Atzeni, P., Bugiotti, F., Rossi, L.: Uniform access to NoSQL systems. Inf. Syst. 43, 117–133 (2014)

    Article  Google Scholar 

  4. Baader, F., Calvanese, D., McGuinness, D., Patel-Schneider, P., Nardi, D.: The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press (2003)

    Google Scholar 

  5. Baazizi, M.-A., Colazzo, D., Ghelli, G., Sartiani, C.: Parametric schema inference for massive JSON datasets. VLDB J. 28(4), 497–521 (2019)

    Article  Google Scholar 

  6. Bex, G.J., Gelade, W., Neven, F., Vansummeren, S.: Learning deterministic regular expressions for the inference of schemas from XML data. ACM Trans. Web 4(4), 14:1–14:32 (2010)

    Article  Google Scholar 

  7. Bex, G.J., Neven, F., Schwentick, T., Vansummeren, S.: Inference of concise regular expressions and DTDs. ACM Trans. Database Syst. 35(2), 11:1–11:47 (2010)

    Article  Google Scholar 

  8. Bonaque, R., et al.: Mixed-instance querying: a lightweight integration architecture for data journalism. PVLDB 9(13), 1513–1516 (2016)

    Google Scholar 

  9. Bruneliere, H., Perez, J.G., Wimmer, M., Cabot, J.: EMF views: a view mechanism for integrating heterogeneous models. In: Johannesson, P., Lee, M.L., Liddle, S.W., Opdahl, A.L., López, Ó.P. (eds.) ER 2015. LNCS, vol. 9381, pp. 317–325. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25264-3_23

    Chapter  Google Scholar 

  10. Bugiotti, F., Bursztyn, D., Deutsch, A., Ileana, I., Manolescu, I.: Invisible glue: scalable self-tuning multi-stores. In: CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, 4–7 January 2015, Online Proceedings (2015). www.cidrdb.org

  11. Bugiotti, F., Bursztyn, D., Deutsch, A., Manolescu, I., Zampetakis, S.: Flexible hybrid stores: constraint-based rewriting to the rescue. In: 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, 16–20 May 2016, pp. 1394–1397 (2016)

    Google Scholar 

  12. Bugiotti, F., Cabibbo, L., Atzeni, P., Torlone, R.: Database design for NoSQL systems. In: Yu, E., Dobbie, G., Jarke, M., Purao, S. (eds.) ER 2014. LNCS, vol. 8824, pp. 223–231. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12206-9_18

    Chapter  Google Scholar 

  13. Chen, P.: The entity-relationship model - toward a unified view of data. ACM Trans. Database Syst. 1(1), 9–36 (1976)

    Article  MathSciNet  Google Scholar 

  14. Chillón, A.H., Morales, S.F., Sevilla, D., Molina, J.G.: Exploring the visualization of schemas for aggregate-oriented NoSQL databases. In: Proceedings of the ER Forum 2017 and the ER 2017 Demo Track co-located with the 36th International Conference on Conceptual Modelling (ER 2017), Valencia, Spain, 6–9 November 2017, CEUR Workshop Proceedings, vol. 1979, pp. 72–85. CEUR-WS.org (2017)

    Google Scholar 

  15. Curino, C., Moon, H.J., Tanca, L., Zaniolo, C.: Schema evolution in wikipedia - toward a web information system benchmark. In: ICEIS 2008 - Proceedings of the Tenth International Conference on Enterprise Information Systems, Volume DISI, Barcelona, Spain, 12–16 June 2008, pp. 323–332 (2008)

    Google Scholar 

  16. Daniel, G., Sunyé, G., Cabot, J.: UMLtoGraphDB: mapping conceptual schemas to graph databases. In: Comyn-Wattiau, I., Tanaka, K., Song, I.-Y., Yamamoto, S., Saeki, M. (eds.) ER 2016. LNCS, vol. 9974, pp. 430–444. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46397-1_33

    Chapter  Google Scholar 

  17. De Virgilio, R., Maccioni, A., Torlone, R.: Model-driven design of graph databases. In: Yu, E., Dobbie, G., Jarke, M., Purao, S. (eds.) ER 2014. LNCS, vol. 8824, pp. 172–185. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12206-9_14

    Chapter  Google Scholar 

  18. DeWitt, D.J., et al.: Split query processing in polybase. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, 22–27 June 2013, pp. 1255–1266. ACM (2013)

    Google Scholar 

  19. Gallinucci, E., Golfarelli, M., Rizzi, S.: Schema profiling of document-oriented databases. Inf. Syst. 75, 13–25 (2018)

    Article  Google Scholar 

  20. Gallinucci, E., Golfarelli, M., Rizzi, S., Abelló, A., Romero, O.: Interactive multidimensional modeling of linked data for exploratory OLAP. Inf. Syst. 77, 86–104 (2018)

    Article  Google Scholar 

  21. Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT: a system for extracting document type descriptors from XML documents. SIGMOD Rec. 29(2), 165–176 (2000)

    Article  Google Scholar 

  22. Génova, G., Llorens, J., Martínez, P.: Semantics of the minimum multiplicity in ternary associations in UML. In: Gogolla, M., Kobryn, C. (eds.) UML 2001. LNCS, vol. 2185, pp. 329–341. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45441-1_25

    Chapter  Google Scholar 

  23. Gold, E.M.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967)

    Article  MathSciNet  Google Scholar 

  24. Hacigümüs, H., Sankaranarayanan, J., Tatemura, J., LeFevre, J., Polyzotis, N.: Odyssey: a multi-store system for evolutionary analytics. PVLDB 6(11), 1180–1181 (2013)

    Google Scholar 

  25. Halpin, T.: Object-Role Modeling Workbook: Data Modeling Exercises Using ORM and NORMA, 1st edn. Technics Publications, LLC, USA (2015)

    Google Scholar 

  26. Herrmann, K., Voigt, H., Rausch, J., Behrend, A., Lehner, W.: Robust and simple database evolution. Inf. Syst. Front. 20(1), 45–61 (2018)

    Article  Google Scholar 

  27. Holubová, I., Scherzinger, S.: Unlocking the potential of nextgen multi-model databases for semantic big data projects. In: Proceedings of the International Workshop on Semantic Big Data, SBD 2019, New York, NY, USA, pp. 6:1–6:6. ACM (2019)

    Google Scholar 

  28. Keet, C.M., Fillottrani, P.R.: Toward an ontology-driven unifying metamodel for UML class diagrams, EER, and ORM2. In: Ng, W., Storey, V.C., Trujillo, J.C. (eds.) ER 2013. LNCS, vol. 8217, pp. 313–326. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41924-9_26

    Chapter  Google Scholar 

  29. Kellou-Menouer, K., Kedad, Z.: Schema discovery in RDF data sources. In: Johannesson, P., Lee, M.L., Liddle, S.W., Opdahl, A.L., López, Ó.P. (eds.) ER 2015. LNCS, vol. 9381, pp. 481–495. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25264-3_36

    Chapter  Google Scholar 

  30. Kepner, J., et al.: Associative array model of SQL, NoSQL, and NewSQL databases. In: HPEC 2016: Proceedings of the High Performance Extreme Computing Conference, pp. 1–9. IEEE (2016)

    Google Scholar 

  31. Klettke, M., Awolin, H., Störl, U., Müller, D., Scherzinger, S.: Uncovering the evolution history of data lakes. In: 2017 IEEE International Conference on Big Data, BigData 2017, Boston, MA, USA, 11–14 December 2017, pp. 2462–2471. IEEE Computer Society (2017)

    Google Scholar 

  32. Klettke, M., Störl, U., Shenavai, M., Scherzinger, S.: NoSQL schema evolution and big data migration at scale. In: 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, 5–8 December 2016, pp. 2764–2774. IEEE Computer Society (2016)

    Google Scholar 

  33. LeFevre, J., Sankaranarayanan, J., Hacigumus, H., Tatemura, J., Polyzotis, N., Carey, M.J.: MISO: souping up big data query processing with a multistore system. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, pp. 1591–1602. ACM (2014)

    Google Scholar 

  34. Liu, Z.H., Lu, J., Gawlick, D., Helskyaho, H., Pogossiants, G., Wu, Z.: Multi-model database management systems - a look forward. In: Gadepally, V., Mattson, T., Stonebraker, M., Wang, F., Luo, G., Teodoro, G. (eds.) DMAH/Poly -2018. LNCS, vol. 11470, pp. 16–29. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14177-6_2

    Chapter  Google Scholar 

  35. Lu, J., Holubová, I.: Multi-model data management: what’s new and what’s next? In: EDBT 2017: Proceedings of the 20th International Conference on Extending Database Technology, pp. 602–605 (2017)

    Google Scholar 

  36. Lu, J., Holubová, I.: Multi-model databases: a new journey to handle the variety of data. ACM Comput. Surv. 52(3), 55:1–55:38 (2019)

    Article  Google Scholar 

  37. Lu, J., Holubová, I., Cautis, B.: Multi-model databases and tightly integrated polystores: current practices, comparisons, and open challenges. In: CIKM 2018: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 2301–2302 (2018)

    Google Scholar 

  38. Mlýnková, I., Nečaský, M.: Heuristic methods for inference of XML schemas: lessons learned and open issues. Informatica Lith. Acad. Sci. 24(4), 577–602 (2013)

    MathSciNet  Google Scholar 

  39. OMG.: Business Process Model and Notation (BPMN), Version 2.0. OMG Standard, Object Management Group, January 2011

    Google Scholar 

  40. Pokorný, J.: Conceptual and database modelling of graph databases. In: IDEAS 2016: Proceedings of the 20th International Database Engineering & Applications Symposium, New York, NY, USA, pp. 370–377. ACM (2016)

    Google Scholar 

  41. Rumbaugh, J., Jacobson, I., Booch, G.: Unified Modeling Language Reference Manual. Pearson Higher Education (2004)

    Google Scholar 

  42. Saur, K., Dumitras, T., Hicks, M.W.: Evolving NoSQL Databases Without Downtime. CoRR, abs/1506.08800 (2015)

    Google Scholar 

  43. Scherzinger, S., Klettke, M., Störl, U.: Managing schema evolution in NoSQL data stores. In Proceedings of DBPL 2013: Proceedings of the 14th International Symposium on Database Programming Languages (2013)

    Google Scholar 

  44. Schildgen, J., Lottermann, T., Deßloch, S.: Cross-system NoSQL data transformations with NotaQL. In: Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR 2016, New York, NY, USA, pp. 5:1–5:10. ACM (2016)

    Google Scholar 

  45. Sevilla Ruiz, D., Morales, S.F., García Molina, J.: Inferring versioned schemas from NoSQL databases and its applications. In: Johannesson, P., Lee, M.L., Liddle, S.W., Opdahl, A.L., López, Ó.P. (eds.) ER 2015. LNCS, vol. 9381, pp. 467–480. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25264-3_35

    Chapter  Google Scholar 

  46. Störl, U., Müller, D., Tekleab, A., Tolale, S., Stenzel, J., Klettke, M., Scherzinger, S.: Curating variational data in application development. Proc. ICDE 2018, 1605–1608 (2018)

    Google Scholar 

  47. Tian, Y., Zou, T., Ozcan, F., Goncalves, R., Pirahesh, H.: Joins for hybrid warehouses: exploiting massive parallelism in hadoop and enterprise data warehouses. In: Proceedings of the 18th International Conference on Extending Database Technology, EDBT 2015, Brussels, Belgium, 23–27 March 2015, pp. 373–384. OpenProceedings.org (2015)

    Google Scholar 

Download references

Acknowledgements

This work was partly supported by the German Research Foundation (Deutsche Forschungsgemeinschaft (DFG)), grant number 385808805 (M. Klettke, U. Störl) and the Charles University project PROGRES Q48 (I. Holubová). We want to thank Stefanie Scherzinger and Mark Lukas Möller for numerous interesting and helpful discussions and several comments on this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irena Holubová .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Holubová, I., Klettke, M., Störl, U. (2019). Evolution Management of Multi-model Data . In: Gadepally, V., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2019 2019. Lecture Notes in Computer Science(), vol 11721. Springer, Cham. https://doi.org/10.1007/978-3-030-33752-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33752-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33751-3

  • Online ISBN: 978-3-030-33752-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics