Skip to main content

FASTDB: An Array Database System for Efficient Storing and Analyzing Massive Scientific Data

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9532))

Abstract

With the development of science and technology, the data size and complexity of scientific data are increased rapidly, which made efficient data storage and parallel analysis of scientific data become a big challenge. The previous techniques that combine the traditional relational database with analysis software tends cannot efficiently meet the performance requirement of large scale scientific data based analysis. In this paper, we present FASTDB, a distributed array database system that optimized for massive scientific data management and provide a share-nothing, parallel array processing analysis. In order to demonstrate the intrinsic performance characteristics of FASTDB, we applied it into the interactive analysis of data from astronomical surveys, and designed a series of experiments with scientific analysis tasks. According to the experimental results, we found FASTDB can be significantly fast than traditional database based SkyServer in many typical analytical scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gray, J., Liu, D.T., DeWitt, D., Heber, G., Nieto-Santisteban, M., Szalay, A.S.: Scientific data management in the coming decade. MSR-TR-2005-10 (2005)

    Google Scholar 

  2. Hey, T., Tansley, S., Tolle, K.: The fourth paradigm: data-intensive scientific discoveries. Microsoft research, p. 10 (2009)

    Google Scholar 

  3. Gray, J., Szalay, A.S., Thakar, A.R., Kunszt, P.Z., Stoughton, C., Slutz, D., et al.: Data mining the SDSS SkyServer database. MSR-TR-2002-01 (2002)

    Google Scholar 

  4. Five-hundred-meter aperture spherical telescope. http://fast.bao.ac.cn/en/

  5. Abadi, D.J., Madden, S.R., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proceedings of SIGMOD, pp. 671–682 (2006)

    Google Scholar 

  6. Soroush, E., Balazinska, M., Wang, D.: ArrayStore: a storage manager for complex parallel array processing. In: Proceedings of SIGMOD, pp. 253–264 (2011)

    Google Scholar 

  7. Seering, A., Cudre-Mauroux, P., Madden, S., Stonebraker, M.: Eficient versioning for scientific array databases. In: Proceedings of ICDE, pp. 1013–1024 (2012)

    Google Scholar 

  8. Cudre-Mauroux, P., Kimura, H., Kimura, H., Lim, K.-T., Rogers, J., Simakov, R., Soroush, E., et al.: A demonstration of SciDB: a science-oriented DBMS. In: Proceedings of VLDB, pp. 1534–1537 (2009)

    Google Scholar 

  9. Stonebraker, M., Becla, J., Dewitt, D., Lim, K.-T., Maier, D., Ratzesberger, O., Zdonik, S.: Requirements for science data bases and SciDB. In: Proceedings of CIDR (2009)

    Google Scholar 

  10. Brown, G.: Overview of SciDB: large scale array storage, processing and analysis. In: Proceedings of ICDE, pp. 963–968 (2010)

    Google Scholar 

  11. Hui, L., Nengjun, Q., Hongyuan, L., Mei, C., Min, Z., Menglin, H.: FASTDB: a array database system for efficient storing and analyzing massive scientific data. Technical report, GZU-ACMIS-TR-2014-07, pp. 1–104. (in Chinese)

    Google Scholar 

  12. Paradigm 4 Inc. http://www.paradigm4.com/

  13. Marcos, D., Connolly, A.J., et al.: ASCOT: a collaborative platform for the virtual observatory. In: Proceedings of ADASS XXI, vol. 461, pp. 901–904 (2012)

    Google Scholar 

  14. Vanderplas, J., Soroush, E., Krughoff, S., Balazinska, M., Connolly, A.: Squeezing a big orange into little boxes: the AscotDB system for parallel processing of data on a sphere. IEEE Data Eng. Bull. 36, 11–20 (2013)

    Google Scholar 

  15. Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system RasDaMan. In: Proceedings of SIGMOD, pp. 575–576 (1998)

    Google Scholar 

  16. Baumann, P., Dumitru, A.M., Merticariu, V.: The array database that is not a database: file based array query answering in rasdaman. In: Nascimento, M.A., Sellis, T., Cheng, R., Sander, J., Zheng, Yu., Kriegel, H.-P., Renz, M., Sengstock, C. (eds.) SSTD 2013. LNCS, vol. 8098, pp. 478–483. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  17. Rasdaman Inc. http://www.rasdaman.com/

  18. Ivanova, M., Nes, N., Goncalves, R., Kersten, M.L.: MonetDB/SQL meets SkyServer: the challenges of a scientific database. In: Proceedings of SSDBM, pp. 7–13 (2007)

    Google Scholar 

  19. Kersten, L., Zhang, Y., Ivanova, M., Nes, N.: SciQL, a query language for science applications. In: Proceedings of Array Databases Workshop, pp. 1–12 (2011)

    Google Scholar 

  20. Xiangsheng, K.: Scientific data processing using MapReduce in cloud environments. J. Chem. Pharm. Res. 6, 1270–1276 (2014)

    Google Scholar 

  21. Lai, W.K., Chen, Y.-U., Wu, T.-Y., Obaidat, M.: Towards a framework for large-scale multimedia data storage and processing on Hadoop platform. J. Supercomputing 68, 488–507 (2014)

    Article  Google Scholar 

  22. Buck, B., Watkins, N., LeFevre, J., et al.: SciHadoop: array-based query processing in Hadoop. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 66:1–66:11 (2011)

    Google Scholar 

  23. SkyServer. http://skyserver.sdss.org/

  24. Data Release 9 of Sloan Digital Sky Survey. http://skyserver.sdss.org/dr9/en/

  25. Typical astronomic queries. http://cas.sdss.org/dr9/en/help/docs/realquery.asp

Download references

Acknowledgments

This work was supported by the China Ministry of Science and Technology under the State Key Development Program for Basic Research (2012CB821800), Fund of National Natural Science Foundation of China (No. 61462012, 61562010, U1531246), Scientific Research Fund for talents recruiting of Guizhou University (No. 700246003301), Science and Technology Fund of Guizhou Province (No. J [2013]2099), High Tech. Project Fund of Guizhou Development and Reform Commission (No. [2013]2069), Industrial Research Projects of the Science and Technology Plan of Guizhou Province (No. GY[2014]3018) and The Major Applied Basic Research Program of Guizhou Province (No. JZ20142001, No. JZ20142001-05).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Li, H. et al. (2015). FASTDB: An Array Database System for Efficient Storing and Analyzing Massive Scientific Data. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9532. Springer, Cham. https://doi.org/10.1007/978-3-319-27161-3_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27161-3_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27160-6

  • Online ISBN: 978-3-319-27161-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics