skip to main content
10.1145/3035918.3064041acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Incremental View Maintenance over Array Data

Published: 09 May 2017 Publication History

Abstract

Science applications are producing an ever-increasing volume of multi-dimensional data that are mainly processed with distributed array databases. These raw arrays are ``cooked'' into derived data products using complex pipelines that are time-consuming. As a result, derived data products are released infrequently and become stale soon thereafter. In this paper, we introduce materialized array views as a database construct for scientific data products. We model the ``cooking'' process as incremental view maintenance with batch updates and give a three-stage heuristic that finds effective update plans. Moreover, the heuristic repartitions the array and the view continuously based on a window of past updates as a side-effect of view maintenance without overhead. We design an analytical cost model for integrating materialized array views in queries. A thorough experimental evaluation confirms that the proposed techniques are able to incrementally maintain a real astronomical data product in a production environment.

References

[1]
P. Agrawal, A. Silberstein, B. F. Cooper, U. Srivastava, and R. Ramakrishnan. Asynchronous view maintenance for vlsd databases. In SIGMOD 2009.
[2]
Y. Ahmad, O. Kennedy, C. Koch, and M. Nikolic. DBToaster: Higher-Order Delta Processing for Dynamic, Frequently Fresh Views. PVLDB, 5, 2012.
[3]
F. D. Albareti et al. The Thirteenth Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the SDSS-IV Survey Mapping Nearby Galaxies at Apache Point Observatory. http://arxiv.org/abs/1608.02013, 2016.
[4]
A. R. van Ballegooij. RAM: A Multidimensional Array DBMS. In EDBT 2004.
[5]
M. Bamha, F. Bentayeb, and G. Hains. An efficient scalable parallel view maintenance algorithm for shared nothing multi-processor machines. In DEXA 1999.
[6]
P. Baumann, A. Dehmel, P. Furtado, R. Ritsch, and N. Widmann. The Multidimensional Database System RasDaMan. In SIGMOD 1998.
[7]
P. Baumann and V. Merticariu. On the Efficient Evaluation of Array Joins. geo-bigdata.github.io/2015/peter.pdf.
[8]
J. A. Blakeley, P.-A. Larson, and F. W. Tompa. Efficiently Updating Materialized Views. In SIGMOD 1986.
[9]
P. Brown et al. Overview of SciDB: Large Scale Array Storage, Processing and Analysis. In SIGMOD 2010.
[10]
J. B. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Polyzotis, and S. Brandt. SciHadoop: Array-based Query Processing in Hadoop. In SC 2011.
[11]
Y. Cheng and F. Rusu. Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID. Distrib. and Parallel Databases, 2014.
[12]
R. Chirkova and J. Yang. Materialized Views. Foundations and Trends in Databases, 4(4):295--405, 2011.
[13]
S. Cohen, W. Nutt, and Y. Sagiv. Rewriting Queries with Arbitrary Aggregation Functions using Views. ACM Transactions on Database Systems (TODS), 2006.
[14]
L. S. Colby, T. Griffin, L. Libkin, I. S. Mumick, and H. Trickey. Algorithms for Deferred View Maintenance. In SIGMOD 1996.
[15]
R. Cornacchia, S. Héman, M. Zukowski, A. P. de Vries, and P. Boncz. Flexible and Efficient IR using Array Databases. VLDB Journal (VLDBJ), 17, 2008.
[16]
P. Cudre-Mauroux, H. Kimura, K.-T. Lim, J. Rogers, S. Madden, M. Stonebraker, S. B. Zdonik, and P. G. Brown. SS-DB: A Standard Science DBMS Benchmark. http://www.xldb.org/science-benchmark/.
[17]
P. Cudre-Mauroux, E. Wu, and S. Madden. TrajStore: An Adaptive Storage System for Very Large Trajectory Data Sets. In ICDE 2010.
[18]
D. DeHaan, P.-A. Larson, and J. Zhou. Stacked Indexed Views in Microsoft SQL Server. In SIGMOD 2005.
[19]
J. Duggan and M. Stonebraker. Incremental Elasticity For Array Databases. In SIGMOD 2014.
[20]
J. Duggan, O. Papaemmanouil et al. Skew-Aware Join Optimization for Array Databases. In SIGMOD 2015.
[21]
A. Gal-Yam et al. Real-Time Detection and Rapid Multiwavelength Follow-Up Observations of a Highly Subluminous Type II-P Supernova from the Palomar Transient Factory Survey. Astrophysical Journal, 736(2), 2011.
[22]
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery, 1(1):29--53, 1997.
[23]
A. Gupta and I. S. Mumick. Maintenance of Materialized Views: Problems, Techniques, and Applications. IEEE Data Eng. Bull., 18(2), 1995.
[24]
A. Y. Halevy. Answering Queries using Views: A Survey. VLDB Journal, 10(4):270--294, 2001.
[25]
H. He, J. Xie, J. Yang, and H. Yu. Asymmetric Batch Incremental View Maintenance. In ICDE 2005.
[26]
E. J. Hilton, A. A. West, S. L. Hawley, and A. F. Kowalski. M Dwarf Flares from Time-resolved Sloan Digital Sky Survey Spectra. The Astronomical Journal, 140(5), 2010.
[27]
S. Idreos et al. MonetDB: Two Decades of Research in Column-Oriented Database Architectures. IEEE Data Eng. Bull., 35(1), 2012.
[28]
G. S. Iwerks, H. Samet, and K. P. Smith. Maintenance of K-NN and Spatial Join Queries on Continuously Moving Points. ACM Transactions on Database Systems (TODS), 31(2), 2006.
[29]
B. Jansen. Constrained Bipartite Vertex Cover: The Easy Kernel Is Essentially Tight. In LIPIcs 2016.
[30]
Y. Katsis, K. W. Ong, Y. Papakonstantinou, and K. K. Zhao. Utilizing IDs to Accelerate Incremental View Maintenance. In SIGMOD 2015.
[31]
H. Kellerer, U. Pferschy, and D. Pisinger. Introduction to NP-Completeness of Knapsack Problems. Springer, 2004.
[32]
L. J. Kewley, W. R. Brown, M. J. Geller, S. J. Kenyon, and M. J. Kurtz. SDSS 0809+ 1729: Connections Between Extremely Metal-Poor Galaxies and Gamma-Ray Burst Hosts. The Astronomical Journal, 133(3), 2007.
[33]
H. A. Kuno and G. Graefe. Deferred Maintenance of Indexes and of Materialized Views. In DNIS 2011.
[34]
P.-A. Larson and J. Zhou. Efficient Maintenance of Materialized Outer-Join Views. In ICDE 2007.
[35]
K.-T. Lim, D. Maier, J. Becla, M. Kersten, Y. Zhang, and M. Stonebraker. ArrayQL Syntax. http://www.xldb.org/wp-content/uploads/2012/09/ArrayQL-Draft-4.pdf. {Online; February 2017}.
[36]
B. Liu and E. A. Rundensteiner. Cost-driven General Join View Maintenance over Distributed Data Sources. In ICDE 2005.
[37]
G. Luo, J. F. Naughton, C. J. Ellmann, and M. W. Watzke. A Comparison of Three Methods for Join View Maintenance in Parallel RDBMS. In ICDE 2003.
[38]
D. Maier. ArrayQL Algebra: version 3. http://www.xldb.org/wp-content/uploads/2012/09/ArrayQL_Algebra_v3.pdf. {Online; February 2017}.
[39]
A. P. Marathe and K. Salem. Query Processing Techniques for Arrays. VLDB Journal, 11(1):68--91, 2002.
[40]
H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. Materialized View Selection and Maintenance Using Multi-Query Optimization. In SIGMOD 2001.
[41]
M. Nikolic et al. How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates. In SIGMOD 2016.
[42]
T. Palpanas, R. Sidle, R. Cochrane, and H. Pirahesh. Incremental Maintenance for Non-Distributive Aggregate Functions. In VLDB 2002.
[43]
D. Quass and J. Widom. Online View Maintenance. In SIGMOD 1997.
[44]
F. Rusu and Y. Cheng. A Survey on Array Storage, Query Languages, and Systems. CoRR, abs/1302.0103, 2013.
[45]
K. Salem, K. Beyer, B. Lindsay, and R. Cochrane. How to Roll a Join: Asynchronous Incremental View Maintenance. In SIGMOD 2000.
[46]
A. D. Sarma, Y. He, and S. Chaudhuri. ClusterJoin: A Similarity Joins Framework using MapReduce. PVLDB, 7, 2014.
[47]
A. Segev and J. Park. Maintaining Materialized Views in Distributed Databases. In ICDE 1989.
[48]
A. Segev and J. Park. Updating Distributed Materialized Views. TKDE, 1989.
[49]
E. Soroush, M. Balazinska, and D. Wang. ArrayStore: A Storage Manager for Complex Parallel Array Processing. In SIGMOD 2011.
[50]
J. D. Ullman. NP-Complete Scheduling Problems. Journal of Computer and System Sciences, 10(3), 1975.
[51]
Y. Wang, X. Yang, H. Mo, F. C. Van den Bosch, S. M. Weinmann, and Y. Chu. The Clustering of SDSS Galaxy Groups: Mass and Color Dependence. The Astrophysical Journal, 687(2), 2008.
[52]
J. Yang and J. Widom. Incremental Computation and Maintenance of Temporal Aggregates. In ICDE 2001.
[53]
X. Yang, H. Mo, F. C. Van den Bosch, A. Pasquali, C. Li, and M. Barden. Galaxy Groups in the SDSS DR4. The Astrophysical Journal, 671(1), 2007.
[54]
Y. Zhang, H. Herodotos, and J. Yang. RIOT: I/O-Efficient Numerical Computing without SQL. In CIDR 2009.
[55]
Y. Zhang, M. Kersten, M. Ivanova, and N. Nes. SciQL: Bridging the Gap between Science and Relational DBMS. In IDEAS 2011.
[56]
W. Zhao, F. Rusu, B. Dong, and K. Wu. Similarity Join over Array Data. In SIGMOD 2016.
[57]
J. Zhou, P.-A. Larson, and H. G. Elmongui. Lazy Maintenance of Materialized Views. In VLDB 2007.

Cited By

View all
  • (2024)Quantum Tensor DBMS and Quantum Gantt Charts: Towards Exponentially Faster Earth Data EngineeringEarth10.3390/earth50300275:3(491-547)Online publication date: 14-Sep-2024
  • (2024)MulRF: A Multi-Dimensional Range Filter for Sublinear Time Range Query ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339731336:11(6600-6613)Online publication date: Nov-2024
  • (2023)F-IVM: analytics over relational databases under updatesThe VLDB Journal10.1007/s00778-023-00817-w33:4(903-929)Online publication date: 14-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
© 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. array similarity join
  2. batch updates
  3. greedy heuristics
  4. mixed-integer programming
  5. workload-driven array reorganization

Qualifiers

  • Research-article

Funding Sources

  • U.S. Department of Energy

Conference

SIGMOD/PODS'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)12
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Quantum Tensor DBMS and Quantum Gantt Charts: Towards Exponentially Faster Earth Data EngineeringEarth10.3390/earth50300275:3(491-547)Online publication date: 14-Sep-2024
  • (2024)MulRF: A Multi-Dimensional Range Filter for Sublinear Time Range Query ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339731336:11(6600-6613)Online publication date: Nov-2024
  • (2023)F-IVM: analytics over relational databases under updatesThe VLDB Journal10.1007/s00778-023-00817-w33:4(903-929)Online publication date: 14-Nov-2023
  • (2022)Classifier Construction Under Budget ConstraintsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517863(1160-1174)Online publication date: 10-Jun-2022
  • (2022)Chunk-oriented dimension ordering for efficient range query processing on sparse multidimensional dataWorld Wide Web10.1007/s11280-022-01098-z26:4(1395-1433)Online publication date: 9-Sep-2022
  • (2022)ReSKY: Efficient Subarray Skyline Computation in Array DatabasesDistributed and Parallel Databases10.1007/s10619-022-07419-540:2-3(261-298)Online publication date: 1-Sep-2022
  • (2021)Array DBMSProceedings of the VLDB Endowment10.14778/3476311.347640414:12(3186-3189)Online publication date: 28-Oct-2021
  • (2021)Convergence of Array DBMS and Cellular AutomataProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3458457(2399-2403)Online publication date: 9-Jun-2021
  • (2020)Incremental and Approximate Computations for Accelerating Deep CNN InferenceACM Transactions on Database Systems10.1145/339746145:4(1-42)Online publication date: 6-Dec-2020
  • (2019)Machine learning meets big spatial dataProceedings of the VLDB Endowment10.14778/3352063.335211512:12(1982-1985)Online publication date: 1-Aug-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media