Skip to main content

Towards Machine Learning in Distributed Array DBMS: Networking Considerations

  • Conference paper
  • First Online:
Machine Learning for Networking (MLN 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12629))

Included in the following conference series:

Abstract

Computer networks are veins of modern distributed systems. Array DBMS (Data Base Management Systems) operate on big data which is naturally modeled as arrays, e.g. Earth remote sensing data and numerical simulation. Big data makes array DBMS to be distributed and highly utilize computer networks. The R&D area of array DBMS is relatively young and machine learning is just paving its way to array DBMS. Hence, existing work is this area is rather sparse and is just emerging. This paper considers distributed, large matrix multiplication (LMM) executed directly inside array DBMS. LMM is the core operation for many machine learning techniques on big data. LMM directly inside array DBMS is not well studied and optimized. We present novel LMM approaches for array DBMS and analyze the intricacies of LMM in array DBMS including execution plan construction and network utilization. We carry out performance evaluation in Microsoft Azure Cloud on a network cluster of virtual machines, report insights derived from the experiments, and present our vision for the future machine learning R&D directions based on LMM directly inside array DBMS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. ArcGIS book (2020). https://learn.arcgis.com/en/arcgis-imagery-book/

  2. Battle, L., Chang, R., Stonebraker, M.: Dynamic prefetching of data tiles for interactive visualization. In: SIGMOD, pp. 1363–1375 (2016)

    Google Scholar 

  3. Baumann, P., et al.: The multidimensional database system RasDaMan. In: ACM SIGMOD, pp. 575–577 (1998)

    Google Scholar 

  4. Baumann, P., et al.: Big data analytics for Earth sciences: the EarthServer approach. Int. J. Digit. Earth 9(1), 3–29 (2016)

    Article  Google Scholar 

  5. Choi, D., Park, C.S., Chung, Y.D.: Progressive top-k subarray query processing in array databases. Proc. VLDB Endow. 12(9), 989–1001 (2019)

    Article  Google Scholar 

  6. Cudre-Mauroux, P., et al.: A demonstration of SciDB: a science-oriented DBMS. PVLDB 2(2), 1534–1537 (2009)

    Google Scholar 

  7. Dask (2020). https://dask.org/

  8. Gephi (2020). https://gephi.org/

  9. Gorelick, N., et al.: Google Earth Engine: planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27 (2017)

    Article  Google Scholar 

  10. Ladra, S., Paramá, J.R., Silva-Coira, F.: Scalable and queryable compressed storage structure for raster data. Inf. Syst. 72, 179–204 (2017)

    Article  Google Scholar 

  11. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556–562 (2001)

    Google Scholar 

  12. Lee, S., et al.: DeepRoof: a data-driven approach for solar potential estimation using rooftop imagery. In: SIGKDD, pp. 2105–2113 (2019)

    Google Scholar 

  13. Lewis, A., et al.: The Australian geoscience data cube-foundations and lessons learned. Remote Sens. Environ. 202, 276–292 (2017)

    Article  Google Scholar 

  14. Maxar: 80 TB/day (2017). https://youtu.be/mkKkSRIxU8M

  15. Machine learning inside DBMS (2020). https://analyticsindiamag.com/top-databases-used-in-machine-learning-projects/

  16. Oracle spatial (2020). http://www.oracle.com/database/technologies/spatialandgraph.html

  17. Ordonez, C., Zhang, Y., Johnsson, S.L.: Scalable machine learning computing a data summarization matrix with a parallel array DBMS. Distrib. Parallel Databases 37(3), 329–350 (2019)

    Article  Google Scholar 

  18. Papadopoulos, S., et al.: The TileDB array data storage manager. PVLDB 10(4), 349–360 (2016)

    Google Scholar 

  19. PostGIS (2020). http://postgis.net/

  20. Richards, J.A.: Remote Sensing Digital Image Analysis. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-30062-2

    Book  Google Scholar 

  21. Rodriges Zalipynis, R.A.: ChronosDB: distributed, file based, geospatial array DBMS. PVLDB 11(10), 1247–1261 (2018)

    Google Scholar 

  22. Rodriges Zalipynis, R.A.: Generic distributed in situ aggregation for earth remote sensing imagery. In: van der Aalst, W.M.P., et al. (eds.) AIST 2018. LNCS, vol. 11179, pp. 331–342. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-11027-7_31

    Chapter  Google Scholar 

  23. Rodriges Zalipynis, R.A.: ChronosDB in action: manage, process, and visualize big geospatial arrays in the Cloud. In: Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, 30 June-5 July 2019, pp. 1985–1988. ACM (2019). https://doi.org/10.1145/3299869.3320242

  24. Rodriges Zalipynis, R.A.: Evaluating array DBMS compression techniques for big environmental datasets. In: 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS 2019, Metz, France, 18–21 September 2019, pp. 859–863. IEEE (2019). https://doi.org/10.1109/IDAACS.2019.8924326

  25. Rodriges Zalipynis, R.A.: BitFun: fast answers to queries with tunable functions in geospatial array DBMS. PVLDB 13(12), 2909–2912 (2020). http://www.vldb.org/pvldb/vol13/p2909-zalipynis.pdf

  26. SciDB GEMM (2020). https://paradigm4.atlassian.net/wiki/spaces/scidb/pages/730268668/gemm

  27. Skiena, S.S.: The Data Science Design Manual. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55444-0

    Book  MATH  Google Scholar 

  28. Villarroya, S., Baumann, P.: On the integration of machine learning and array databases. In: ICDE, pp. 1786–1789. IEEE (2020)

    Google Scholar 

  29. Wang, Y., et al.: SAGA: array storage as a DB with support for structural aggregations. In: SSDBM (2014)

    Google Scholar 

  30. Zhang, H., Cheng, X., Zang, H., Park, D.H.: Compiler-level matrix multiplication optimization for deep learning. arXiv preprint arXiv:1909.10616 (2019)

  31. Zhang, L., Zhang, L., Du, B.: Deep learning for remote sensing data: a technical tutorial on the state of the art. IEEE Geosci. Remote Sens. Mag. 4(2), 22–40 (2016)

    Article  Google Scholar 

  32. Zhao, W., Rusu, F., Dong, B., Wu, K.: Similarity join over array data. In: SIGMOD, pp. 2007–2022 (2016)

    Google Scholar 

  33. Zhao, W., et al.: Distributed caching for processing raw arrays. In: SSDBM (2018)

    Google Scholar 

  34. Zilberman, N.: In-network computing (2019). https://www.sigarch.org/in-network-computing-draft/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramon Antonio Rodriges Zalipynis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rodriges Zalipynis, R.A. (2021). Towards Machine Learning in Distributed Array DBMS: Networking Considerations. In: Renault, É., Boumerdassi, S., Mühlethaler, P. (eds) Machine Learning for Networking. MLN 2020. Lecture Notes in Computer Science(), vol 12629. Springer, Cham. https://doi.org/10.1007/978-3-030-70866-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-70866-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-70865-8

  • Online ISBN: 978-3-030-70866-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics