Skip to main content
Log in

A scalable array storage for efficient maintenance of future data

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Array-based storage system employs a renewed interest in the featured applications for their easy maintenance in the context of large volume data. However, the conventional schemes of array storages suffer from lack of scalability for dynamic data as they need to reallocate the whole array if the size of the array limit overflows. Therefore, the conventional array storage is difficult to use when the data grows overtime. To maintain such velocity of the future data, the array storage must be dynamic which can expand the size according to the growing nature of the data. Moreover, the address space of the array-based storage system overflows quickly if the length of dimension and the number of dimension is large. The index array models render dynamic storage system, but retrieval from index array model shows poor performance than the conventional schemes. In this paper, we demonstrate an index array-based scalable array storage that maintains the growing future data during runtime. The key idea is to convert an n-dimensional array into 2 dimensions and organize the array elements into ordered collections called segments. These segments divide the large allocation size into smaller one that delays the address space overflow. The retrieval performance of the proposed scheme outperforms other existing array systems. Since it converts an n-dimensional array into 2 dimensions, and it needs 2 indices only to maintain scalability. Therefore, it reduces the index overhead as well. The scheme also shows improved storage management performance than other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th $\{$USENIX$\}$ Symposium on Operating Systems Design and Implementation ($\{$OSDI$\}$ 16), pp 265–283

  2. Ahsan SMM, Hasan KA (2011) An implementation scheme for multidimensional extendable array operations and its evaluation. In: International Conference on Informatics Engineering and Information Science. Springer, pp 136–150

  3. Baumann P, Dehmel A, Furtado P, Ritsch R, Widmann N (1998) The multidimensional database system RasDaMan. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp 575–577

  4. Baumann P, Dumitru AM, Merticariu V (2013) The array database that is not a database: file based array query answering in RasDaMan. In: International Symposium on Spatial and Temporal Databases. Springer, pp 478–483

  5. Blanas S, Wu K, Byna S, Dong B, Shoshani A (2014) Parallel data analysis directly on scientific file formats. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp 385–396

  6. Brown PG (2010) Overview of SCiDB: large scale array storage, processing and analysis. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp 963–968

  7. Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM TOCS 26(2):1–26

    Article  Google Scholar 

  8. Cheng Y, Qin C, Rusu F (2012) Glade: big data analytics made easy. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp 697–700

  9. Dumitru A, Merticariu V, Baumann P (2014) Exploring cloud opportunities from an array database perspective. In: Proceedings of Workshop on Data Analytics in the Cloud, pp 1–4

  10. Dumitru AM, Merticariu V, Baumann P (2016) Array database scalability: intercontinental queries on petabyte datasets. In: Proceedings of the 28th International Conference on Scientific and Statistical Database Management, pp 1–5

  11. Folk M, Heber G, Koziol Q, Pourmal E, Robinson D (2011) An overview of the hdf5 technology suite and its applications. In: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, pp 36–47

  12. Franzenburg A (2003) Distributed storage array. US patent app. 10/071,406

  13. Furtado P, Baumann P (1999) Storage of multidimensional arrays based on arbitrary tiling. In: Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337). IEEE, pp 480–489

  14. Grolinger K, Higashino WA, Tiwari A, Capretz MA (2013) Data management in cloud environments: Nosql and newsql data stores. J Cloud Comput Adv Syst Appl 2(1):22

    Article  Google Scholar 

  15. Hasan KA, Shaikh MAH (2017) Efficient representation of higher-dimensional arrays by dimension transformations. J Supercomput 73(6):2801–2822

    Article  Google Scholar 

  16. Hasan KA, Tsuji T, Higuchi K (2007) An efficient implementation for MOLAP basic data structure and its evaluation. In: International Conference on Database Systems for Advanced Applications. Springer, pp 288–299

  17. He J, Wu Y, Dong Y, Zhang Y, Zhou W (2016) Dynamic multidimensional index for large-scale cloud data. J Cloud Comput 5(1):10

    Article  Google Scholar 

  18. http://www.hdfgroup.org/hdf5/

  19. Idreos S, Groffen F, Nes N, Manegold S, Mullender S, Kersten M (2012) Monetdb: two decades of research in column-oriented database. IEEE Data Eng Bull 35:40–45

    Google Scholar 

  20. McKinley KS, Carr S, Tseng CW (1996) Improving data locality with loop transformations. ACM TOPLAS 18(4):424–453

    Article  Google Scholar 

  21. Nimako G, Otoo EJ, Ohene-Kwofie D (2013) Pexta: a parallel chunked extendible dense array i/o for global array (ga). In: 2013 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 1–8

  22. Omar MT, Hasan KA (2016) A scalable storage system for structured data based on higher order index array. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp 247–252

  23. Omar MT, Hasan KA (2016) Towards an efficient maintenance of address space overflow for array based storage system. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). IEEE, pp 133–138

  24. Otoo EJ, Merrett T (1983) A storage scheme for extendible arrays. Computing 31(1):1–9

    Article  Google Scholar 

  25. Otoo EJ, Nimako G, Ohene-Kwofie D (2013) Chunked extendible dense arrays for scientific data storage. Parallel Comput 39(12):802–818

    Article  Google Scholar 

  26. Otoo EJ, Rotem D, Seshadri S (2007) Optimal chunking of large multidimensional arrays for data warehousing. In: Proceedings of the ACM Tenth International Workshop on Data Warehousing and OLAP, pp 25–32

  27. Papadopoulos S, Datta K, Madden S, Mattson T (2016) The TileDB array data storage manager. Proc VLDB Endow 10(4):349–360

    Article  Google Scholar 

  28. Reed DA, Dongarra J (2015) Exascale computing and big data. Commun ACM 58(7):56–68

    Article  Google Scholar 

  29. Rew R, Davis G (1990) Netcdf: an interface for scientific data access. IEEE Comput Graph Appl 10(4):76–82

    Article  Google Scholar 

  30. Rotem D, Zhao JL (1996) Extendible arrays for statistical databases and OLAP applications. In: Proceedings of 8th International Conference on Scientific and Statistical Data Base Management. IEEE, pp 108–117

  31. Rusu F, Cheng Y (2013) A survey on array storage, query languages, and systems. arXiv preprint arXiv:1302.0103

  32. Sarawagi S, Stonebraker M (1994) Efficient organization of large multidimensional arrays. In: Proceedings of 1994 IEEE 10th International Conference on Data Engineering. IEEE, pp 328–336

  33. Shacham H, Page M, Pfaff B, Goh EJ, Modadugu N, Boneh D (2004) On the effectiveness of address-space randomization. In: Proceedings of the 11th ACM Conference on Computer and Communications Security, pp 298–307

  34. Shaikh MAH, Hasan KA (2015) Efficient storage scheme for n-dimensional sparse array: Gcrs/gccs. In: 2015 International Conference on High Performance Computing & Simulation (HPCS). IEEE, pp 137–142

  35. Shimada T, Tsuji T, Higuchi K (2008) A storage scheme for multidimensional data alleviating dimension dependency. In: 2008 Third International Conference on Digital Information Management. IEEE, pp 662–668

  36. Soroush E, Balazinska M, Wang D (2011) Arraystore: a storage manager for complex parallel array processing. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp 253–264

  37. Stonebraker M, Brown P, Poliakov A, Raman S (2011) The architecture of scidb. In: International Conference on Scientific and Statistical Database Management. Springer, pp 1–16

  38. Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 374–383

  39. Wang Y, Su Y, Agrawal G (2015) A novel approach for approximate aggregations over arrays. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, pp 1–12

  40. Xing H, Agrawal G (2018) Compass: compact array storage with value index. In: Proceedings of the 30th International Conference on Scientific and Statistical Database Management, pp 1–12

  41. Xing H, Floratos S, Blanas S, Byna S, Prabhat M, Wu K, Brown P (2018) Arraybridge: interweaving declarative array processing in scidb with imperative hdf5-based programs. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, pp 977–988

  42. Zhang Y, Kersten M, Manegold S (2013) SciQL: array data processing inside an RDBMS. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 1049–1052

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. M. Azharul Hasan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Omar, M.T., Azharul Hasan, K.M. & Tsuji, T. A scalable array storage for efficient maintenance of future data. J Supercomput 77, 6540–6565 (2021). https://doi.org/10.1007/s11227-020-03554-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03554-x

Keywords

Navigation