Abstract
A large number of applications such as sensor networks, RFID-based monitoring systems, mobile object management and location-based services manage data pervaded with uncertainty. Usually, users wish/prefer high quality results (i.e. with highest certainty) when they pose queries with strict conditions over these data. However, as they may not be clear about the contents of the databases that contain such data, these queries may be failing i.e., they may return no result or results that do not satisfy the expected level of certainty. In this case, users may try to change manually the query conditions to obtain approximate answers. Due to the exponential combination number of query conditions, this procedure results in a time-consuming and frustrating task. In this paper, we address the failing queries problem by proposing an approach that identifies the query parts, called Minimal Failing Subqueries (MFSs), that are responsible for its failure. Thanks to these MFSs, interactive and automatic approaches can be set up to help the user reformulating her/his query. We also compute, in the same time, a set of Maximal Succeeding Subqueries (XSSs) that represents a list of non failing queries with a maximal number of predicates of the initial query. The results of these XSSs constitute good alternative answers that can be returned to the user instead of an empty result. To demonstrate the efficiency and the effectiveness of our proposal, a set of experiments have been conducted with synthetic and real datasets. A comparison with baseline and related work approaches shows the interest of our proposal.
Similar content being viewed by others
Notes
Given a set of objects described by a list of criteria, a skyline is a subset of objects that are not dominated (in the sense of Pareto) by any other object with respect to some criteria of interest(Börzsönyi et al. 2001).
Note that maximal is not the same as maximum. There may be larger successful subqueries than maximal successful subqueries but this is not the case of maximum successful subqueries.
These real datasets are available at https://github.com/sean-chester/SkyBench.
The coverage w.r.t. a criterion is a set of MFSs that contains this criterion, for instance, a criterion stands for some conditions that the query must satisfy.
References
Aggarwal, C.C., & Yu, P.S. (2009). A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 21(5), 609–623.
Belheouane, C., Jean, S., Hadjali, A., Azzoune, H. (2017). Handling failing queries over uncertain databases. IEEE International conference on fuzzy systems (fuzz-IEEE 2017), naples, italy, 9 - 12 july.
Börzsönyi, S., Kossmann, D., Stocker, K. (2001). The skyline operator In: International conference on data engineering (ICDE 2001), pp 421–430.
Bosc, P., & Pivert, O. (1995). Sqlf: a relational database language for fuzzy querying. Trans. Fuz Sys., 3(1), 1–17.
Bosc, P., Pivert, O., Prade, H. (2010). An uncertain database model and a query algebra based on possibilistic certainty. In: Soft computing and pattern recognition (SoCPaR), pp. 63–68.
Chambi, S., Lemire, D., Kaser, O., Godin, R. (2014). Better bitmap performance with roaring bitmaps. arXiv:1402.6407.
Chester, S., Šidlauskas, D., Assent, I., Bøgh, K.S. (2015). Scalable parallelization of skyline computation for multi-core processors. In: International conference on data engineering (ICDE 2015), pp. 1083–1094.
Conti, M., Willemsen, J., Crispo, B. (2013). Providing source location privacy in wireless sensor networks: a survey. IEEE Communications Surveys & Tutorials, 15(3), 1238–1280.
Dubois, D., & Prade, H. (1987). Necessity measures and the resolution principle. IEEE Transactions on Systems, Man and Cybernetics, 17(3), 474–478.
Dubois, D., & Prade, H. (2008). Possibility theory. New York: Plenum.
Fokou, G., Jean, S., Hadjali, A., Baron, M. (2016a). RDF query relaxation strategies based on failure causes. In: Extended Semantic Web Conference (ESWC 2016), pp. 439–454.
Fokou, G., Jean, S., Hadjali, A., Baron, M. (2016b). Handling failing RDF queries: From diagnosis to relaxation. Knowledge and information systems (KAIS).
Gao, Y., Liu, Q., Chen, G., Zheng, B., Zhou, L. (2015). Answering why-not questions on reverse top-k queries. The VLDB Journal, 8(7), 738–749.
Godfrey, P. (1997). Minimization in cooperative response to failing database queries. International Journal of Cooperative Information Systems, 6(2), 95–149.
Jannach, D. (2006). Techniques for fast query relaxation in content-based recommender systems. In: KI’06.
Jannach, D. (2008). Finding preferred query relaxations in content-based recommenders. In Intelligent techniques and tools for novel system architectures (pp. 81–97). Springer.
Jannach, D. (2009). Fast computation of query relaxations for knowledge-based recommenders. AI Communications, 22(4), 235–248.
Jeffery, S.R., Garofalakis, M., Franklin, M.J. (2006). Adaptive cleaning for RFID data streams. In: International conference on very large data bases (VLDB 2006), pp. 163–174.
Junker, U. (2004). Preferred explanations and relaxations for over-constrained problems. In: AAAI-2004.
Kashyap, A., Hristidis, V., Petropoulos, M. (2010). Facetor: Cost-driven exploration of faceted query results. In: International conference on information and knowledge management (CIKM 2010), pp. 719–728.
Kurose, J., Lyons, E., McLaughlin, D., Pepyne, D., Philips, B., Westbrook, D., Zink, M. (2006). An end-user-responsive sensor network architecture for hazardous weather detection, prediction and response. In: Asian internet engineering conference, pp. 1–15.
Li, M., & Liu, Y. (2009). Underground coal mine monitoring with wireless sensor networks. ACM Transactions on Sensor Networks (TOSN), 5(2), 10.
McSherry, D. (2004). Incremental relaxation of unsuccessful queries. In Advances in case-based reasoning, (Vol. 3155 pp. 131–148). Berlin: Springer.
Min, D., & Yih, Y. (2011). Fuzzy logic-based approach to detecting a passive RFID tag in an outpatient clinic. Journal of Medical Systems, 35(3), 423–432.
Patroumpas, K., Papamichalis, M., Sellis, T. (2012). Probabilistic range monitoring of streaming uncertain positions in geosocial networks. In: Scientific and statistical database management(SSDBM 2012), pp. 20–37.
Pivert, O., & Prade, H. (2015). A certainty-based model for uncertain databases. IEEE Transactions on Fuzzy Systems, 23(4), 1181–1196.
Pivert, O., & Smits, G. (2015). How to efficiently diagnose and repair fuzzy database queries that fail. In 50th years of fuzzy logic and its applications (pp. 499–517): Springer.
Pivert, O., Smits, G., Hadjali, A., Jaudoin, H. (2011). Efficient detection of minimal failing subqueries in a fuzzy querying context. In: East-European conference on advances in databases and information systems (ADBIS 2011), pp. 243–256.
Re, C., Dalvi, N., Suciu, D. (2007). Efficient top-k query evaluation on probabilistic data. In: International conference on data engineering (ICDE 2007), pp. 886–895.
Sen, P., Deshpande, A., Getoor, L. (2009). PrDB: Managing and exploiting rich correlations in probabilistic databases. The VLDB Journal, 18(5), 1065–1090.
Soliman, M.A., Ilyas, I.F., Chang, K.C.C. (2008). Probabilistic top-k and ranking-aggregate queries. ACM Transactions On Database Systems, 33(3), 13:1–13:54.
Tran, T. T., Peng, L., Diao, Y., Mcgregor, A., Liu, A. (2012). Claro: Modeling and processing uncertain data streams. The VLDB Journal, 21(5), 651–676.
Wang, Y., Li, X., Li, X., Wang, Y. (2013). A survey of queries over uncertain data. Knowledge and Information Systems (KAIS), 37(3), 485–530.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Belheouane, C., Jean, S., Azzoune, H. et al. Cooperative treatment of failing queries over uncertain databases: a matrix-computation-based approach. J Intell Inf Syst 52, 211–238 (2019). https://doi.org/10.1007/s10844-018-0538-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-018-0538-z