Abstract
Computer networks are veins of modern distributed systems. Array DBMS (Data Base Management Systems) operate on big data which is naturally modeled as arrays, e.g. Earth remote sensing data and numerical simulation. Big data makes array DBMS to be distributed and highly utilize computer networks. The R&D area of array DBMS is relatively young and machine learning is just paving its way to array DBMS. Hence, existing work is this area is rather sparse and is just emerging. This paper considers distributed, large matrix multiplication (LMM) executed directly inside array DBMS. LMM is the core operation for many machine learning techniques on big data. LMM directly inside array DBMS is not well studied and optimized. We present novel LMM approaches for array DBMS and analyze the intricacies of LMM in array DBMS including execution plan construction and network utilization. We carry out performance evaluation in Microsoft Azure Cloud on a network cluster of virtual machines, report insights derived from the experiments, and present our vision for the future machine learning R&D directions based on LMM directly inside array DBMS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
ArcGIS book (2020). https://learn.arcgis.com/en/arcgis-imagery-book/
Battle, L., Chang, R., Stonebraker, M.: Dynamic prefetching of data tiles for interactive visualization. In: SIGMOD, pp. 1363–1375 (2016)
Baumann, P., et al.: The multidimensional database system RasDaMan. In: ACM SIGMOD, pp. 575–577 (1998)
Baumann, P., et al.: Big data analytics for Earth sciences: the EarthServer approach. Int. J. Digit. Earth 9(1), 3–29 (2016)
Choi, D., Park, C.S., Chung, Y.D.: Progressive top-k subarray query processing in array databases. Proc. VLDB Endow. 12(9), 989–1001 (2019)
Cudre-Mauroux, P., et al.: A demonstration of SciDB: a science-oriented DBMS. PVLDB 2(2), 1534–1537 (2009)
Dask (2020). https://dask.org/
Gephi (2020). https://gephi.org/
Gorelick, N., et al.: Google Earth Engine: planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27 (2017)
Ladra, S., Paramá, J.R., Silva-Coira, F.: Scalable and queryable compressed storage structure for raster data. Inf. Syst. 72, 179–204 (2017)
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556–562 (2001)
Lee, S., et al.: DeepRoof: a data-driven approach for solar potential estimation using rooftop imagery. In: SIGKDD, pp. 2105–2113 (2019)
Lewis, A., et al.: The Australian geoscience data cube-foundations and lessons learned. Remote Sens. Environ. 202, 276–292 (2017)
Maxar: 80 TB/day (2017). https://youtu.be/mkKkSRIxU8M
Machine learning inside DBMS (2020). https://analyticsindiamag.com/top-databases-used-in-machine-learning-projects/
Oracle spatial (2020). http://www.oracle.com/database/technologies/spatialandgraph.html
Ordonez, C., Zhang, Y., Johnsson, S.L.: Scalable machine learning computing a data summarization matrix with a parallel array DBMS. Distrib. Parallel Databases 37(3), 329–350 (2019)
Papadopoulos, S., et al.: The TileDB array data storage manager. PVLDB 10(4), 349–360 (2016)
PostGIS (2020). http://postgis.net/
Richards, J.A.: Remote Sensing Digital Image Analysis. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-30062-2
Rodriges Zalipynis, R.A.: ChronosDB: distributed, file based, geospatial array DBMS. PVLDB 11(10), 1247–1261 (2018)
Rodriges Zalipynis, R.A.: Generic distributed in situ aggregation for earth remote sensing imagery. In: van der Aalst, W.M.P., et al. (eds.) AIST 2018. LNCS, vol. 11179, pp. 331–342. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-11027-7_31
Rodriges Zalipynis, R.A.: ChronosDB in action: manage, process, and visualize big geospatial arrays in the Cloud. In: Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, 30 June-5 July 2019, pp. 1985–1988. ACM (2019). https://doi.org/10.1145/3299869.3320242
Rodriges Zalipynis, R.A.: Evaluating array DBMS compression techniques for big environmental datasets. In: 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS 2019, Metz, France, 18–21 September 2019, pp. 859–863. IEEE (2019). https://doi.org/10.1109/IDAACS.2019.8924326
Rodriges Zalipynis, R.A.: BitFun: fast answers to queries with tunable functions in geospatial array DBMS. PVLDB 13(12), 2909–2912 (2020). http://www.vldb.org/pvldb/vol13/p2909-zalipynis.pdf
SciDB GEMM (2020). https://paradigm4.atlassian.net/wiki/spaces/scidb/pages/730268668/gemm
Skiena, S.S.: The Data Science Design Manual. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55444-0
Villarroya, S., Baumann, P.: On the integration of machine learning and array databases. In: ICDE, pp. 1786–1789. IEEE (2020)
Wang, Y., et al.: SAGA: array storage as a DB with support for structural aggregations. In: SSDBM (2014)
Zhang, H., Cheng, X., Zang, H., Park, D.H.: Compiler-level matrix multiplication optimization for deep learning. arXiv preprint arXiv:1909.10616 (2019)
Zhang, L., Zhang, L., Du, B.: Deep learning for remote sensing data: a technical tutorial on the state of the art. IEEE Geosci. Remote Sens. Mag. 4(2), 22–40 (2016)
Zhao, W., Rusu, F., Dong, B., Wu, K.: Similarity join over array data. In: SIGMOD, pp. 2007–2022 (2016)
Zhao, W., et al.: Distributed caching for processing raw arrays. In: SSDBM (2018)
Zilberman, N.: In-network computing (2019). https://www.sigarch.org/in-network-computing-draft/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Rodriges Zalipynis, R.A. (2021). Towards Machine Learning in Distributed Array DBMS: Networking Considerations. In: Renault, É., Boumerdassi, S., Mühlethaler, P. (eds) Machine Learning for Networking. MLN 2020. Lecture Notes in Computer Science(), vol 12629. Springer, Cham. https://doi.org/10.1007/978-3-030-70866-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-70866-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70865-8
Online ISBN: 978-3-030-70866-5
eBook Packages: Computer ScienceComputer Science (R0)