Skip to main content

TDM: A Tensor Data Model for Logical Data Independence in Polystore Systems

  • Conference paper
  • First Online:
Heterogeneous Data Management, Polystores, and Analytics for Healthcare (DMAH 2018, Poly 2018)

Abstract

This paper presents a Tensor Data Model to carry out logical data independence in polystore systems. TDM is an expressive model that can link different data models of different data stores and simplifies data transformations by expressing them by means of operators whose semantics are clearly defined. Our contribution is the definition of a data model based on tensors for which we add the notions of typed schema using associative arrays. We describe a set of operators and we show how the model constructs take place in a mediator/wrapper like architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://neo4j.com/developer/graph-algorithms/.

  2. 2.

    https://www.vertica.com/product/database-machine-learning/.

  3. 3.

    https://www.paradigm4.com/.

  4. 4.

    https://www.tensorflow.org/.

  5. 5.

    http://deeplearning.net/software/theano/.

  6. 6.

    https://keras.io/.

  7. 7.

    https://spark.apache.org/mllib/.

  8. 8.

    http://wp.sigmod.org/?p=1629.

  9. 9.

    To be isomorphic two data models must allow two way transformations at the structure level but also support equivalence between sets of operators. For example graph data model and relational data model are not isomorphic because relational data model with relational algebra do not support directly transitive closure.

  10. 10.

    https://spark.apache.org/sql/.

  11. 11.

    https://drill.apache.org/.

  12. 12.

    http://forward.ucsd.edu/.

  13. 13.

    https://hive.apache.org/.

  14. 14.

    https://amplab.cs.berkeley.edu/software/.

  15. 15.

    http://www.alluxio.org/.

  16. 16.

    https://azure.microsoft.com/en-us/services/data-lake-analytics/.

  17. 17.

    https://www.ibm.com/analytics/data-lake.

  18. 18.

    https://botometer.iuni.iu.edu/.

References

  1. Abo Khamis, M., Ngo, H.Q., Nguyen, X., Olteanu, D., Schleich, M.: In-database learning with sparse tensors. In: Proceedings of the 35th ACM SIGMOD/PODS Symposium on Principles of Database Systems, pp. 325–340. ACM (2018)

    Google Scholar 

  2. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of mapreduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2(1), 922–933 (2009)

    Article  Google Scholar 

  3. Allen, D., Hodler, A.: Weave together graph and relational data in apache spark. In: Spark+AI Summit. Neo4j (2018). https://vimeo.com/274433801

  4. Angles, R.: A comparison of current graph database models. In: 2012 IEEE 28th International Conference on Data Engineering Workshops (ICDEW), pp. 171–177. IEEE (2012)

    Google Scholar 

  5. Arora, S., Ge, R., Moitra, A.: Learning topic models-going beyond SVD. In: 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 1–10. IEEE (2012)

    Google Scholar 

  6. Astrahan, M.M., et al.: System R: relational approach to database management. ACM Trans. Database Syst. (TODS) 1(2), 97–137 (1976)

    Article  Google Scholar 

  7. Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In: ACM SIGMETRICS Performance Evaluation Review, vol. 40, pp. 53–64. ACM (2012)

    Article  Google Scholar 

  8. Austin, W., Ballard, G., Kolda, T.G.: Parallel tensor compression for large-scale scientific data. In: 2016 IEEE International Parallel and Distributed Processing Symposium, pp. 912–922. IEEE (2016)

    Google Scholar 

  9. Baazizi, M.A., Lahmar, H.B., Colazzo, D., Ghelli, G., Sartiani, C.: Schema inference for massive JSON datasets. In: Extending Database Technology (EDBT), pp. 222–233 (2017)

    Google Scholar 

  10. Battaglino, C., Ballard, G., Kolda, T.G.: A practical randomized CP tensor decomposition. arXiv preprint arXiv:1701.06600 (2017)

  11. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008(10), P10008 (2008)

    Article  Google Scholar 

  12. Brodie, M.L., Schmidt, J.W.: Final report of the ANSI/X3/SPARC DBS-SG relational database task group. ACM SIGMOD Rec. 12(4), 1–62 (1982)

    Google Scholar 

  13. Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley, Hoboken (2009)

    Book  Google Scholar 

  14. De Domenico, M., et al.: Mathematical formulation of multilayer networks. Phys. Rev. X 3(4), 041022 (2013)

    Google Scholar 

  15. DiScala, M., Abadi, D.J.: Automatic generation of normalized relational schemas from nested key-value data. In: Proceedings of the 2016 International Conference on Management of Data, pp. 295–310. ACM (2016)

    Google Scholar 

  16. Duggan, J., et al.: The BigDAWG polystore system. ACM SIGMOD Rec. 44(2), 11–16 (2015)

    Article  Google Scholar 

  17. Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. ACM SIGMOD Rec. 34(4), 27–33 (2005)

    Article  Google Scholar 

  18. Gadepally, V., et al.: The BigDAWG polystore system and architecture. In: IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6 (2016)

    Google Scholar 

  19. Ghosh, D.: Multiparadigm data storage for enterprise applications. IEEE Softw. 27(5), 57–60 (2010)

    Article  Google Scholar 

  20. Haerder, T., Reuter, A.: Principles of transaction-oriented database recovery. ACM Comput. Surv. (CSUR) 15(4), 287–317 (1983)

    Article  MathSciNet  Google Scholar 

  21. Härder, T.: DBMS architecture-the layer model and its evolution. Datenbank-Spektrum 13, 45–57 (2005)

    Google Scholar 

  22. Hellerstein, J.M., et al.: The MADlib analytics library: or MAD skills, the SQL. Proc. VLDB Endow. 5(12), 1700–1711 (2012)

    Article  Google Scholar 

  23. Hogben, L.: Handbook of Linear Algebra. Chapman and Hall/CRC, Boca Raton (2013)

    MATH  Google Scholar 

  24. Hölsch, J., Schmidt, T., Grossniklaus, M.: On the performance of analytical and pattern matching graph queries in Neo4j and a relational database. In: EDBT/ICDT 2017 Joint Conference: 6th International Workshop on Querying Graph Structured Data (GraphQ) (2017)

    Google Scholar 

  25. Hutchison, D., Howe, B., Suciu, D.: Lara: a key-value algebra underlying arrays and relations. arXiv preprint arXiv:1604.03607 (2016)

  26. Hutchison, D., Howe, B., Suciu, D.: LaraDB: a minimalist kernel for linear and relational algebra computation. In: Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, pp. 2–12. ACM (2017)

    Google Scholar 

  27. Jananthan, H., Zhou, Z., Gadepally, V., Hutchison, D., Kim, S., Kepner, J.: Polystore mathematics of relational algebra. In: IEEE International Conference on Big Data (Big Data), pp. 3180–3189, December 2017. https://doi.org/10.1109/BigData.2017.8258298

  28. Kang, U., Papalexakis, E., Harpale, A., Faloutsos, C.: GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012, pp. 316–324. ACM (2012)

    Google Scholar 

  29. Kepner, J., et al.: Dynamic distributed dimensional data model (D4M) database and computation system. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5349–5352. IEEE (2012)

    Google Scholar 

  30. Kepner, J., et al.: Achieving 100,000,000 database inserts per second using Accumulo and D4M. In: High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2014)

    Google Scholar 

  31. Kim, M.: TensorDB and tensor-relational model (TRM) for efficient tensor-relational operations (2014)

    Google Scholar 

  32. Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J.P., Moreno, Y., Porter, M.A.: Multilayer networks. J. Complex Netw. 2(3), 203–271 (2014)

    Article  Google Scholar 

  33. Klug, A.: Equivalence of relational algebra and relational calculus query languages having aggregate functions. J. ACM 29(3), 699–717 (1982)

    Article  MathSciNet  Google Scholar 

  34. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)

    Article  MathSciNet  Google Scholar 

  35. Kolev, B., Bondiombouy, C., Valduriez, P., Jiménez-Peris, R., Pau, R., Pereira, J.: The CloudMdsQL multistore system. In: Proceedings of the International Conference on Management of Data (SIGMOD), pp. 2113–2116 (2016)

    Google Scholar 

  36. Kuang, L., Hao, F., Yang, L.T., Lin, M., Luo, C., Min, G.: A tensor-based approach for big data representation and dimensionality reduction. IEEE Trans. Emerg. Top. Comput. 2(3), 280–291 (2014)

    Article  Google Scholar 

  37. Leclercq, E., Savonnet, M.: A tensor based data model for polystore: an application to social networks data. In: Proceedings of the 22nd International Database Engineering and Applications Symposium (IDEAS), pp. 1–9. ACM, New York (2018)

    Google Scholar 

  38. Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2014)

    Book  Google Scholar 

  39. Li, X., Cui, B., Chen, Y., Wu, W., Zhang, C.: MLog: towards declarative in-database machine learning. Proc. VLDB Endow. 10(12), 1933–1936 (2017)

    Article  Google Scholar 

  40. Litwin, W., Abdellatif, A., Zeroual, A., Nicolas, B., Vigier, P.: MSQL: a multidatabase language. Inf. Sci. 49(1–3), 59–101 (1989)

    Article  Google Scholar 

  41. Ong, K.W., Papakonstantinou, Y., Vernoux, R.: The SQL++ unifying semi-structured query language, and an expressiveness benchmark of SQL-on-Hadoop, NoSQL and NewSQL databases. Technical report, UCSD (2014)

    Google Scholar 

  42. Ong, K.W., Papakonstantinou, Y., Vernoux, R.: The SQL++ query language: configurable, unifying and semi-structured. Technical report, UCSD (2015)

    Google Scholar 

  43. Özsoyoğlu, G., Özsoyoğlu, Z.M., Matos, V.: Extending relational algebra and relational calculus with set-valued attributes and aggregate functions. ACM Trans. Database Syst. 12(4), 566–592 (1987)

    Article  MathSciNet  Google Scholar 

  44. Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer, Heidelberg (2011). https://doi.org/10.1007/978-1-4419-8834-8

    Book  Google Scholar 

  45. Sharp, J., McMurtry, D., Oakley, A., Subramanian, M., Zhang, H.: Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence. Microsoft Patterns & Practices, 1st edn. (2013)

    Google Scholar 

  46. Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1), 8 (2015)

    Article  Google Scholar 

  47. Stonebraker, M., et al.: One size fits all? Part 2: benchmarking results. In: Proceedings of CIDR (2007)

    Google Scholar 

  48. Stonebraker, M., Cetintemel, U.: “One size fits all”: an idea whose time has come and gone. In: Proceedings of 21st International Conference on Data Engineering, ICDE 2005, pp. 2–11. IEEE (2005)

    Google Scholar 

  49. Tan, R., Chirkova, R., Gadepally, V., Mattson, T.G.: Enabling query processing across heterogeneous data models: a survey. In: IEEE International Conference on Big Data (Big Data), pp. 3211–3220. IEEE (2017)

    Google Scholar 

  50. Vargas-Solar, G., Zechinelli-Martini, J.L., Espinosa-Oviedo, J.A.: Big data management: what to keep from the past to face future challenges? Data Sci. Eng. 2(4), 328–345 (2017)

    Article  Google Scholar 

  51. Varol, O., Ferrara, E., Davis, C.A., Menczer, F., Flammini, A.: Online human-bot interactions: detection, estimation, and characterization. In: Proceedings of the Eleventh International Conference on Web and Social Media (ICWSM), pp. 280–289 (2017)

    Google Scholar 

  52. Wang, J., et al.: The Myria big data management and analytics system and cloud services. In: CIDR (2017)

    Google Scholar 

  53. Zhou, Y., Wilkinson, D., Schreiber, R., Pan, R.: Large-scale parallel collaborative filtering for the netflix prize. In: Fleischer, R., Xu, J. (eds.) AAIM 2008. LNCS, vol. 5034, pp. 337–348. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68880-8_32

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric Leclercq .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Leclercq, E., Savonnet, M. (2019). TDM: A Tensor Data Model for Logical Data Independence in Polystore Systems. In: Gadepally, V., Mattson, T., Stonebraker, M., Wang, F., Luo, G., Teodoro, G. (eds) Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2018 2018. Lecture Notes in Computer Science(), vol 11470. Springer, Cham. https://doi.org/10.1007/978-3-030-14177-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14177-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14176-9

  • Online ISBN: 978-3-030-14177-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics