Abstract
In the era of Big Data, scientific research is challenged with handling massive data sets. To actually take advantage of Big Data, the key problem is to retrieve the desired cup of data from the ocean, as most applications only need a fraction of the entire data set. As the indexing and retrieving method is intrinsically connected with specific features of the data set and the goal of research, a universal solution is hardly possible. Designed for efficiently querying Big Data in astronomy time domain research, AQUAdex, a new spatial indexing and retrieving method is proposed to extract Time Series Images form Astronomical Big Data. By mapping images to tiles (pixels) on the celestial sphere, AQUAdex can complete queries 9 times faster, which is proven by theoretical analysis and experimental results. AQUAdex is especially suitable for Big Data applications because of its excellent scalability. The query time only increases 59 % while the data size grows 14 times larger.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aji, A., Wang, F., Saltz, J.H.: Towards building a high performance spatial query system for large scale medical imaging data. In: Proceedings of the 20th International Conference on Advances in Geographic Information Systems, pp. 309–318. ACM (2012)
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop GIS: a high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)
Alagiannis, I., Borovica, R., Branco, M., Idreos, S., Ailamaki, A.: NoDB: efficient query execution on raw data files. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 241–252. ACM (2012)
Alagiannis, I., Borovica, R., Branco, M., Idreos, S., Ailamaki, A.: NoDB in action: adaptive query processing on raw data. Proc. VLDB Endow. 5(12), 1942–1945 (2012)
Berriman, G.B., Groom, S.L.: How will astronomy archives survive the data tsunami? Commun. ACM 54(12), 52–56 (2011)
Blanas, S., Wu, K., Byna, S., Dong, B., Shoshani, A.: Parallel data analysis directly on scientific file formats. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 385–396. ACM (2014)
Brown, P.G.: Overview of SciDB: large scale array storage, processing and analysis. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 963–968. ACM (2010)
China-VO: Data explorer of China virtual observatory. http://explorer.china-vo.org
Ivanova, M., Kersten, M., Manegold, S.: Data vaults: a symbiosis between database technology and scientific file repositories. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 485–494. Springer, Heidelberg (2012)
NASA: Jet propulsion laboratory healpix homepage. http://healpix.jpl.nasa.gov/
Ng, M.K., Huang, Z.: Data-mining massive time series astronomical data: challenges, problems and solutions. Inf. Softw. Technol. 41(9), 545–556 (1999)
Planthaber, G., Stonebraker, M., Frew, J.: EarthDB: scalable analysis of MODIS data using SciDB. In: Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, pp. 11–19. ACM (2012)
Richter, S., Quiané-Ruiz, J.-A., Schuh, S., Dittrich, J.: Towards zero-overhead static and adaptive indexing in hadoop. VLDB J. 23(3), 469–494 (2014)
Silva, V., de Oliveira, D., Mattoso, M.: Exploratory analysis of raw data files through dataflows. In: 2014 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW), pp. 114–119. IEEE (2014)
Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The architecture of SciDB. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 1–16. Springer, Heidelberg (2011)
Tian, Y., Alagiannis, I., Liarou, E., Ailamaki, A., Michiardi, P., Vukolić, M.: DiNoDB: Efficient large-scale raw data analytics. In: Proceedings of the First International Workshop on Bringing the Value of Big Data to Users (Data4U 2014), p. 1. ACM (2014)
Zhao, Q.: Research on high-efficient massive data oriented astronomical cross-match. Ph.D. thesis, Tianjin University (2010)
Hong, Z.: Source code of the algorithms in this paper. http://paperdata.china-vo.org/Hong.Zhi/2015/ICA3PP/AQUAdex/AQUAdex_Zhi.cpp
Acknowledgments
This work was supported in part by National Natural Science Foundation of China (NSFC) through grant 61303021, U1531111 and U1231108. The data set used in the experiments are provided by the AST3 team of NAOC. The authors wish to express gratitude to Ms. Yiyi Gao and Ms. Xingyu Xu for their insightful suggestions. Sincere thanks also goes to Mr. Jie Wen for helping putting the final touches in place.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hong, Z. et al. (2015). AQUAdex: A Highly Efficient Indexing and Retrieving Method for Astronomical Big Data of Time Series Images. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9529. Springer, Cham. https://doi.org/10.1007/978-3-319-27122-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-27122-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27121-7
Online ISBN: 978-3-319-27122-4
eBook Packages: Computer ScienceComputer Science (R0)