Abstract
The rapidly increased data size make large scale scientific database often have a huge time delay between loading data into the system and ready for receiving query request. To solve this problem, we proposed an efficient parallel data loading approach named FASTLoad. It is designed to maximize the given resource (e.g., network bandwidth, main memory) utilization for optimizing the data loading in large scale array model based scientific database system. To verify the efficiency of FASTLoad, we implemented it in our Adaptable Data Loading System and evaluate its performance over various sizes of large scientific data sets. Our experimental results show that the performance of FASTLoad can be 4 to 6 times fast than the built-in loading techniques of states-of-the-arts array model based scientific database system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hey, T., Tansley, S., Tolle, K. (eds.): The Fourth Paradigm: Data-Intensive Scientific Discoveries. Microsoft Research, Redmond (2009)
Cudre-Mauroux, P., Kimura, H., et al.: A demonstration of SciDB: a science-oriented DBMS. VLDB 2, 1534–1537 (2009)
Alagiannis, I., Borovica, R., Branco, M., Idreos, S., et al.: NoDB in action: adaptive query processing on raw data. VLDB 5, 1942–1945 (2012)
Alagiannis, I., Borovica, R., Branco, M., Idreos, S., et al.: NoDB: efficient query execution on raw data files. In: SIGMOD (2012)
Blanas, S., Wu, K., Byna, S., Dong, B., Shoshani, A.: Parallel data analysis directly on scientific file formats. In: SIGMOD (2014)
Witkowski, A., Colgan, M., Brumm, A., Cruanes, T., Baer, H.: Performant and Scalable Data Loading with Oracle Database 11g (2011)
Cheng, Y., Rusu, F.: Parallel in-situ data processing with speculative loading. In: SIGMOD (2014)
Arumugam, S., Dobra, A., Jermaine, C., et al.: The DataPath system: a data-centric analytic processing engine for large data warehouses. In: SIGMOD (2010)
Lock (computer science). http://en.wikipedia.org/wiki/Lock_(computer_science)
Duggan, J., Stonebraker, M.: Incremental elasticity for array databases. In: SIGMOD/PODS 2014 (2014)
Szalay, A.S.: The sloan digital sky survey. Comput. Sci. Eng. 1(2), 54–62 (1999)
Dobos, L., Szalay, A., Blakeley, J., Budavári, T., Csabai, I., Tomic, D., Milovanovic, M., et al.: Array Requirements for Scientific Applications and an Implementation for Microsoft SQL Server
Widmann, N., Baumann, P.: Efficient execution of operations in a DBMS for multidimensional arrays. In: Proceedings of the SSDBM 1998, Capri, Italy, pp. 155–165, July 1998
Thakar, A.R., Szalay, A.S., Kunszt, P.Z., Gray, J.: Migrating a multiterabyte archive from object to relational databases. Comput. Sci. Eng. 5(5), 16–29 (2003)
Stonebraker, M., Becla, J., DeWitt, D., Lim, K.-T., Maier, D., Ratzesberger, O., Zdonik, S.: Requirements for science databases and SCIDB. In: CIDR 2009 Conference. Asilomar, CA, USA, January 2009
Brown, P., et al.: Overview of SciDB: large scale array storage, processing and analysis. In: SIGMOD 2010, pp. 963–968 (2010)
Cudre-Mauroux, P., Kimura, H., Lim, K.-T., Rogers, J., Simakov, R., et al.: A demonstration of SciDB: a science-oriented DBMS. In: VLDB 2009, pp. 1534–1537 (2009)
Mathematical multidimensional array. http://en.wikipedia.org/wiki/Array_data_structure
Agrawal, R., et al.: Modeling multidimensional databases. In: Proceedings of the ICDE 1997, Birmingham, pp. 232–243, April 1997. [2]
Lock (database). http://en.wikipedia.org/wiki/Lock_(database)
Soroush, E., Balazinska, M., Wang, D.: ArrayStore: a storage manager for complex parallel array processing. In: SIGMOD (2011)
Seering, A., Cudre-Mauroux, P., et al.: Efficient versioning for scientific array databases. In: International Conference on Data Engineering (ICDE) (2012)
Virtualization, October 2012. http://en.wikipedia.org/wiki/Virtualization
Kernel based virtual machine. http://www.linux-kvm.org/page/Main_Page
Hypervisor: http://en.wikipedia.org/wiki/Hypervisor
Virtualization support through KVM. Linux: 2.6.20 Kernel release notes, 05 February 2007. http://kernelnewbies.org. Accessed 16 June 2014
X86 virtualization. http://en.wikipedia.org/wiki/X86_virtualization
Set (mathematics). http://en.wikipedia.org/wiki/Set_(mathematics)
Cartesian product. http://en.wikipedia.org/wiki/Cartesian_product
Abouzied, A., Abadi, D.J., Silberschatz, A.: Invisible loading: Access-driven data transfer from raw files into database systems. In: EDBT/ICDT (2013)
Planthaber, G., Stonebraker, M., Frew, J.: EarthDB: scalable analysis of MODIS data using SciDB. In: ACM SIGSPATIAL BIGSPATIAL 2012 (2012)
Gray, J., Szalay, A.S., Thakar, A.R., Kunszt, P.Z., Stoughton, C., Slutz, D., vandenBerg, J.: Data mining the SDSS SkyServer database. MSR-TR-2002-01 (2002)
Acknowledgments
This work was supported by the China Ministry of Science and Technology under the State Key Development Program for Basic Research (2012CB821800), Fund of National Natural Science Foundation of China (No. 61462012, 61562010, U1531246), Scientific Research Fund for talents recruiting of Guizhou University (No. 700246003301), Science and Technology Fund of Guizhou Province (No. J [2013]2099), High Tech. Project Fund of Guizhou Development and Reform Commission (No. [2013]2069), Industrial Research Projects of the Science and Technology Plan of Guizhou Province (No. GY[2014]3018) and The Major Applied Basic Research Program of Guizhou Province (No. JZ20142001, No. JZ20142001-05).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, H., Li, H., Chen, M., Dai, Z., Zhu, M., Huang, M. (2015). Enhancing Parallel Data Loading for Large Scale Scientific Database. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9529. Springer, Cham. https://doi.org/10.1007/978-3-319-27122-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-27122-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27121-7
Online ISBN: 978-3-319-27122-4
eBook Packages: Computer ScienceComputer Science (R0)