Abstract
With the development of science and technology, the data size and complexity of scientific data are increased rapidly, which made efficient data storage and parallel analysis of scientific data become a big challenge. The previous techniques that combine the traditional relational database with analysis software tends cannot efficiently meet the performance requirement of large scale scientific data based analysis. In this paper, we present FASTDB, a distributed array database system that optimized for massive scientific data management and provide a share-nothing, parallel array processing analysis. In order to demonstrate the intrinsic performance characteristics of FASTDB, we applied it into the interactive analysis of data from astronomical surveys, and designed a series of experiments with scientific analysis tasks. According to the experimental results, we found FASTDB can be significantly fast than traditional database based SkyServer in many typical analytical scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gray, J., Liu, D.T., DeWitt, D., Heber, G., Nieto-Santisteban, M., Szalay, A.S.: Scientific data management in the coming decade. MSR-TR-2005-10 (2005)
Hey, T., Tansley, S., Tolle, K.: The fourth paradigm: data-intensive scientific discoveries. Microsoft research, p. 10 (2009)
Gray, J., Szalay, A.S., Thakar, A.R., Kunszt, P.Z., Stoughton, C., Slutz, D., et al.: Data mining the SDSS SkyServer database. MSR-TR-2002-01 (2002)
Five-hundred-meter aperture spherical telescope. http://fast.bao.ac.cn/en/
Abadi, D.J., Madden, S.R., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proceedings of SIGMOD, pp. 671–682 (2006)
Soroush, E., Balazinska, M., Wang, D.: ArrayStore: a storage manager for complex parallel array processing. In: Proceedings of SIGMOD, pp. 253–264 (2011)
Seering, A., Cudre-Mauroux, P., Madden, S., Stonebraker, M.: Eficient versioning for scientific array databases. In: Proceedings of ICDE, pp. 1013–1024 (2012)
Cudre-Mauroux, P., Kimura, H., Kimura, H., Lim, K.-T., Rogers, J., Simakov, R., Soroush, E., et al.: A demonstration of SciDB: a science-oriented DBMS. In: Proceedings of VLDB, pp. 1534–1537 (2009)
Stonebraker, M., Becla, J., Dewitt, D., Lim, K.-T., Maier, D., Ratzesberger, O., Zdonik, S.: Requirements for science data bases and SciDB. In: Proceedings of CIDR (2009)
Brown, G.: Overview of SciDB: large scale array storage, processing and analysis. In: Proceedings of ICDE, pp. 963–968 (2010)
Hui, L., Nengjun, Q., Hongyuan, L., Mei, C., Min, Z., Menglin, H.: FASTDB: a array database system for efficient storing and analyzing massive scientific data. Technical report, GZU-ACMIS-TR-2014-07, pp. 1–104. (in Chinese)
Paradigm 4 Inc. http://www.paradigm4.com/
Marcos, D., Connolly, A.J., et al.: ASCOT: a collaborative platform for the virtual observatory. In: Proceedings of ADASS XXI, vol. 461, pp. 901–904 (2012)
Vanderplas, J., Soroush, E., Krughoff, S., Balazinska, M., Connolly, A.: Squeezing a big orange into little boxes: the AscotDB system for parallel processing of data on a sphere. IEEE Data Eng. Bull. 36, 11–20 (2013)
Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system RasDaMan. In: Proceedings of SIGMOD, pp. 575–576 (1998)
Baumann, P., Dumitru, A.M., Merticariu, V.: The array database that is not a database: file based array query answering in rasdaman. In: Nascimento, M.A., Sellis, T., Cheng, R., Sander, J., Zheng, Yu., Kriegel, H.-P., Renz, M., Sengstock, C. (eds.) SSTD 2013. LNCS, vol. 8098, pp. 478–483. Springer, Heidelberg (2013)
Rasdaman Inc. http://www.rasdaman.com/
Ivanova, M., Nes, N., Goncalves, R., Kersten, M.L.: MonetDB/SQL meets SkyServer: the challenges of a scientific database. In: Proceedings of SSDBM, pp. 7–13 (2007)
Kersten, L., Zhang, Y., Ivanova, M., Nes, N.: SciQL, a query language for science applications. In: Proceedings of Array Databases Workshop, pp. 1–12 (2011)
Xiangsheng, K.: Scientific data processing using MapReduce in cloud environments. J. Chem. Pharm. Res. 6, 1270–1276 (2014)
Lai, W.K., Chen, Y.-U., Wu, T.-Y., Obaidat, M.: Towards a framework for large-scale multimedia data storage and processing on Hadoop platform. J. Supercomputing 68, 488–507 (2014)
Buck, B., Watkins, N., LeFevre, J., et al.: SciHadoop: array-based query processing in Hadoop. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 66:1–66:11 (2011)
SkyServer. http://skyserver.sdss.org/
Data Release 9 of Sloan Digital Sky Survey. http://skyserver.sdss.org/dr9/en/
Typical astronomic queries. http://cas.sdss.org/dr9/en/help/docs/realquery.asp
Acknowledgments
This work was supported by the China Ministry of Science and Technology under the State Key Development Program for Basic Research (2012CB821800), Fund of National Natural Science Foundation of China (No. 61462012, 61562010, U1531246), Scientific Research Fund for talents recruiting of Guizhou University (No. 700246003301), Science and Technology Fund of Guizhou Province (No. J [2013]2099), High Tech. Project Fund of Guizhou Development and Reform Commission (No. [2013]2069), Industrial Research Projects of the Science and Technology Plan of Guizhou Province (No. GY[2014]3018) and The Major Applied Basic Research Program of Guizhou Province (No. JZ20142001, No. JZ20142001-05).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, H. et al. (2015). FASTDB: An Array Database System for Efficient Storing and Analyzing Massive Scientific Data. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9532. Springer, Cham. https://doi.org/10.1007/978-3-319-27161-3_55
Download citation
DOI: https://doi.org/10.1007/978-3-319-27161-3_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27160-6
Online ISBN: 978-3-319-27161-3
eBook Packages: Computer ScienceComputer Science (R0)