Scaling the Construction of Wavelet Synopses for Maximum Error Metrics | IEEE Journals & Magazine | IEEE Xplore

Scaling the Construction of Wavelet Synopses for Maximum Error Metrics


Abstract:

Modern analytics involve computations over enormous numbers of data records. The volume of data and the stringent response-time requirements place increasing emphasis on ...Show More

Abstract:

Modern analytics involve computations over enormous numbers of data records. The volume of data and the stringent response-time requirements place increasing emphasis on the efficiency of approximate query processing. A major challenge over the past years has been the construction of synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum error metric. By approximating sharp discontinuities, wavelet decomposition has proved to be a very effective tool for data reduction. However, existing wavelet thresholding schemes that minimize maximum error metrics are constrained with impractical complexities for large datasets. Furthermore, they cannot efficiently handle the multi-dimensional version of the problem. In order to provide a practical solution, we develop parallel algorithms that take advantage of key-properties of the wavelet decomposition and allocate tasks to multiple workers. To that end, we present (i) a general framework for the parallelization of existing dynamic programming algorithms, (ii) a parallel version of one such DP algorithm, and (iii) two highly efficient distributed greedy algorithms that can deal with data of arbitrary dimensionality. Our extensive experiments on both real and synthetic datasets over Hadoop show that the proposed algorithms achieve linear scalability and superior running-time performance compared to their centralized counterparts.
Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 31, Issue: 9, 01 September 2019)
Page(s): 1794 - 1808
Date of Publication: 26 August 2018

ISSN Information:


Contact IEEE to Subscribe

References

References is not available for this document.