skip to main content
10.1145/3003819.3003823acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
short-paper

AstroSpark: towards a distributed data server for big data in astronomy

Published: 31 October 2016 Publication History

Abstract

Large amounts of astronomical data are continuously collected. As a result, support of scalable and high performance query processing of such data has become increasingly necessary. Apache Spark has been widely adopted as a successor to Apache Hadoop MapReduce to analyze Big Data in distributed frameworks. Despite its rich features, this framework can not be directly exploited towards processing astronomical data. In this work, we present AstroSpark, a distributed data server for astronomical data. AstroSpark extends Spark, a distributed in-memory computing framework, to analyze and query huge volume of astronomical data. It supports astronomical operations such as cone search, cross-match and histogram. AstroSpark introduces data partitioning and optimization techniques to achieve high performance query execution.

References

[1]
ADQL. http://www.ivoa.net/documents/latest/ADQL.html.
[2]
GAIA. http://www.cosmos.esa.int/web/gaia.
[3]
M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, et al. Spark sql: Relational data processing in spark. In Proceedings of the 2015 ACM SIGMOD, pages 1383--1394. ACM, 2015.
[4]
A. Eldawy and M. F. Mokbel. Spatialhadoop: A mapreduce framework for spatial data. In 2015 IEEE 31st International Conference on Data Engineering, pages 1352--1363. IEEE, 2015.
[5]
K. M. Gorski, E. Hivon, A. Banday, B. D. Wandelt, F. K. Hansen, M. Reinecke, and M. Bartelmann. Healpix: a framework for high-resolution discretization and fast analysis of data distributed on the sphere. The Astrophysical Journal, 622(2):759, 2005.
[6]
S. Nishimura, S. Das, D. Agrawal, and A. El Abbadi. MD-hbase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distributed and Parallel Databases, 31(2):289--319, 2013.
[7]
D. Xie, F. Li, B. Yao, G. Li, L. Zhou, and M. Guo. Simba: Efficient in-memory spatial analytics. In Proceedings of the 2016 ACM SIGMOD, pages 1071--1085, 2016.
[8]
J. Yu, J. Wu, and M. Sarwat. Geospark: A cluster computing framework for processing large-scale spatial data. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, page 70. ACM, 2015.
[9]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pages 2--2. USENIX Association, 2012.

Cited By

View all
  • (2024)A Columnar Storage Cone Search Method for Massive Astronomical Data2024 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS)10.1109/ISPDS62779.2024.10667647(319-322)Online publication date: 31-May-2024
  • (2022)Astronomical big data processing using machine learning: A comprehensive reviewExperimental Astronomy10.1007/s10686-021-09827-453:1(1-43)Online publication date: 14-Jan-2022
  • (2020)A Strategy and Architecture Based on Big Data for Power Internet of ThingsProceedings of the 4th International Conference on Computer Science and Application Engineering10.1145/3424978.3424999(1-5)Online publication date: 20-Oct-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SIGSPATIAL PhD '16: Proceedings of the 3rd ACM SIGSPATIAL PhD Symposium
October 2016
22 pages
ISBN:9781450345842
DOI:10.1145/3003819
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • ESRI
  • amazon: amazon
  • Google Inc.
  • Microsoft: Microsoft
  • ORACLE: ORACLE
  • Facebook: Facebook

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. astronomical survey data management
  2. big data
  3. query processing
  4. spark framework

Qualifiers

  • Short-paper

Conference

SIGSPATIAL'16
Sponsor:
  • amazon
  • Microsoft
  • ORACLE
  • Facebook

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Columnar Storage Cone Search Method for Massive Astronomical Data2024 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS)10.1109/ISPDS62779.2024.10667647(319-322)Online publication date: 31-May-2024
  • (2022)Astronomical big data processing using machine learning: A comprehensive reviewExperimental Astronomy10.1007/s10686-021-09827-453:1(1-43)Online publication date: 14-Jan-2022
  • (2020)A Strategy and Architecture Based on Big Data for Power Internet of ThingsProceedings of the 4th International Conference on Computer Science and Application Engineering10.1145/3424978.3424999(1-5)Online publication date: 20-Oct-2020
  • (2020)Detecting cache-related bugs in Spark applicationsProceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3395363.3397353(363-375)Online publication date: 18-Jul-2020
  • (2020)ASTROIDE: A Unified Astronomical Big Data Processing Engine over SparkIEEE Transactions on Big Data10.1109/TBDATA.2018.28737496:3(477-491)Online publication date: 1-Sep-2020
  • (2020)Query Processing and Access Methods for Big Astro and Geo DatabasesKnowledge Discovery in Big Data from Astronomy and Earth Observation10.1016/B978-0-12-819154-5.00018-7(159-171)Online publication date: 2020
  • (2019)EventDB: A Large-Scale Semi-structured Scientific Data Management SystemBig Scientific Data Management10.1007/978-3-030-28061-1_12(105-115)Online publication date: 7-Aug-2019
  • (2018)Efficient astronomical query processing using sparkProceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems10.1145/3274895.3274942(229-238)Online publication date: 6-Nov-2018
  • (2018)FITS Data Source for Apache SparkComputing and Software for Big Science10.1007/s41781-018-0014-z2:1Online publication date: 30-Oct-2018
  • (2017)HX-MATCH: In-Memory Cross-Matching Algorithm for Astronomical Big DataAdvances in Spatial and Temporal Databases10.1007/978-3-319-64367-0_26(411-415)Online publication date: 22-Jul-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media