Abstract:
Probabilistic queries have been extensively explored to provide answers with confidence, in order to support the real-life applications struggling with uncertain data, su...Show MoreMetadata
Abstract:
Probabilistic queries have been extensively explored to provide answers with confidence, in order to support the real-life applications struggling with uncertain data, such as sensor networks and data integration. However, the uncertainty of data may propagate, and thus, the results returned by probabilistic queries contain much noise, which degrades query quality significantly. In this paper, we propose an efficient optimization framework, termed as QueryClean, for both probabilistic skyline computation and probabilistic similarity search. The goal of QueryClean is to optimize query quality via selecting a group of uncertain objects to clean under limited resource available, where a joint-entropy based quality function is leveraged. We develop an efficient structure called ASI to index the possible result sets of probabilistic queries, which helps to avoid many types of probabilistic query evaluations over a large number of the possible worlds for quality computation. Moreover, we present exact and approximate algorithms for the optimization problem, using two newly presented heuristics. Considerable experimental results on both real and synthetic data sets demonstrate the efficiency and scalability of our proposed framework QueryClean.
Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 30, Issue: 9, 01 September 2018)