Elsevier

Applied Soft Computing

Volume 74, January 2019, Pages 51-65
Applied Soft Computing

On fuzzy approaches for enlarging skyline query results

https://doi.org/10.1016/j.asoc.2018.10.013Get rights and content

Highlights

  • We discuss two relaxation strategies: first, we propose to recover the much interesting points among those discriminated by Pareto dominance relationship by leveraging a novel fuzzy dominance relationship.

  • Then, we advocate enlarging the skyline with points that are the closest to the skyline points in the sense of a particular fuzzy absolute closeness relation.

  • A set of properties of the proposed approaches are investigated and efficient optimized algorithms and their complexity are provided as well.

  • We develop and design a user-friendly framework for the skyline relaxation purpose endowed with rich and advanced functionalities.

  • A thorough experimental evaluation of our approaches is performed on large datasets.

Abstract

In the last decade, skyline queries have gained much attention and are proved to be valuable for multi-criteria decisions. Based on the concept of Pareto dominance, they return the non-dominated points, called the skyline points. In practice, it may happen that the skyline only contains a small number of points which could be insufficient for the user needs. In this paper, we discuss two fuzzy-set-based approaches to enriching the small skyline with particular points that could serve the decision makers’ needs. The basic idea consists in identifying the most interesting points among the non-skyline ones. On the one hand, we introduce a novel fuzzy dominance relationship which makes more demanding the dominance between the points of interest. So, much points would be considered as incomparable and then as elements of the new relaxed skyline. On the other hand, we leverage an appropriate fuzzy closeness relation to retrieve non skyline points that are fuzzily close to some skyline points. Furthermore, we develop efficient algorithms to compute the relaxed variants of skyline. Extensive experiments are conducted to demonstrate the effectiveness of our approaches and analyze the performance of the proposed algorithms. A comparative study between the approaches presented is made as well.

Introduction

Preference queries have gained much attention in the Database field in the recent years. Skyline queries [1] are a good example of SQL extensions that allow users to express their preferences in queries. Based on Pareto dominance relationship, skyline queries select all non-dominated points based on a multi-criteria comparison. Let U be a set of d-dimensional points, a skyline query returns, the skyline S, set of points of U that are not dominated by any other point of U. A point p dominates, in the sense of Pareto, another point q iff p is better than or equal to q in all dimensions and strictly better than q in at least one dimension. One can observe that skyline points are incomparable.

It is worthy to note that skyline queries have benefits many types of database applications since its introduction in the database field. They are widely used in non-trivial real applications, including multi-criteria decision making applications [2], [3], Web services such as hotel recommender [4], restaurant finder [5], and peer-to-peer network database [6]. With this widen use of skyline queries over several real life database applications, a lot of research studies have been devoted to efficient computing skyline and introducing multiple variants of skyline queries [7], [8], [9], [3], [10].

However, querying d-dimensional datasets using a skyline operator may lead to two possible scenarios: (i) a large number of skyline points returned, which could be less informative for users requirements, (ii) a small number of skyline points returned, which could be insufficient for users needs. To solve the problem stemmed from the first scenario, various approaches have been proposed to refine the skyline, therefore reducing its size [11], [12], [13], [14], [15], [16], [17], [2], [18]. While for the second scenario only very few works exist to relax the skyline in order to increase the number of skyline results [13], [19], [20]. In this paper, we address the problem of low skyline and propose advanced fuzzy-set-based solutions to enlarge it with a set of particular interesting (non-skyline) points. Such solutions exhibit a cooperative behavior in the sense that they assist the users to obtain the desired results to their skyline queries. Users’ preferences and controlling are two key elements of our solutions. The former is leveraged to choose some specific skyline relaxation parameters and the latter allows ending the relaxation process when the results are satisfactory. In summary, the new main contributions made in this paper1 are as follows:

  • 1.

    We address the skyline relaxation problem by proposing two efficient fuzzy approaches MP2R2 and C2R.3 The former relies on a novel fuzzy dominance relationship which makes more demanding the dominance between two points. As for the latter, it leverages an appropriate fuzzy closeness relation to retrieve non skyline points that are fuzzily close to skyline points. Both approaches allow adding new points to the skyline result.

  • 2.

    For each approach, the semantic basis for the relaxed variant of skyline are discussed in depth. Then, optimized algorithms are developed to efficiently compute each variant of skyline. A theoretical complexity analysis of the proposed algorithms is also investigated.

  • 3.

    We conduct a set of thorough experiments to study and analyze the relevance and effectiveness of the proposed approaches. A comparative study between these approaches and the Gonclaves and Tineo approah [19] is performed as well.

The paper is structured as follows: In Section 2, we introduce some basic notions about fuzzy set theory and skyline queries. Section 3 describes the approaches MP2R and C2R for relaxing the skyline and discusses the semantic basis of each of them. The computation part of the relaxed variants of skyline is presented in Section 4. In Section 5, we provide an overview of existing works. Section 6 is devoted to the experimental study. Finally, we conclude and discuss some perspectives in Section 7.

Section snippets

Fuzzy sets

The concept of fuzzy sets has been developed by Zadeh [23] in 1965 to represent classes or sets whose limits are imprecise. They can describe gradual transitions between total belonging and rejection. Typical examples of these fuzzy classes are those described with adjectives or adverbs natural language, as not expensive, fast and very close.

Formally, a fuzzy set F on the universe X is described by a membership function μF:X[0,1], where μF(x) represents the degree of membership of x in F. By

Fuzzy skyline relaxation

We discuss here our fuzzy approaches to skyline relaxation. Let Srelax and SFE be the relaxed skyline returned respectively by the approaches MP2R and C2R described formally in Sections 3.1 , 3.2 .

Both approaches rely on the main idea that consists of computing the extent to which a point, discarded by the Pareto-dominance relationship, may belong to the relaxed skyline. To this end, and as it will be illustrated further, we associate with each skyline attribute Ai(i{1,,d}) a pair of

Srelax and SFE Computation

To compute the two relaxed variants of Skyline, i.e. Srelax and SFE, we propose a two-steps procedure (see Fig. 8): (i) the skyline computation step; and (ii) the skyline relaxation step.

In the first step, we calculate the regular skyline S using a slightly improved algorithm version, called LIBNL (see algorithm 1), of algorithm BNL proposed in [1]. The LIBNL algorithm uses a function named SkylineCompare( ui,uj ) to evaluate the dominance , in the sense of Pareto, between ui and uj on all

Related work

Our study can be related to the previous works on skyline computation and controlling the skyline size. In this section, we review the major existing approaches on these two topics.

Experimental study

In this section, we present the experimental study that we have conducted. The aim of this study is to prove and demonstrate the effectiveness of the proposed approaches (MP2R and C2R) and their ability to relax small skyline with the most interesting points. In addition, this study allows us to develop a comparative assessment on the quantitative and qualitative aspects of the relaxation process between our two approaches and the approach proposed by Goncalves and Tineo (denoted GT approach) 

Conclusion and perspectives

In this paper, we have addressed the problem of skyline relaxation in a controlled way. The basic idea is to make the skyline more permissive by adding points that strictly speaking do not belong to skyline, but are not far from belonging to it. We have explored two strategies: first, we propose to recover the much interesting points among those discriminated by Pareto dominance relationship by leveraging a novel fuzzy dominance relationship Much Preferred (MP). Then, we advocate to enlarge the

References (43)

  • MorseM.D. et al.

    Efficient continuous skyline computation

    Inf. Sci.

    (2007)
  • ZadehL.A.

    Fuzzy sets

    Inf. Control

    (1965)
  • S. Börzsönyi, D. Kossmann, K. Stocker, The skyline operator, in: Proceedings of the 17th International Conference on...
  • C.Y. Chan, H.V. Jagadish, K. Tan, A.K.H. Tung, Z. Zhang, Finding k-dominant skylines in high dimensional space, in:...
  • YiuM.L. et al.

    Efficient processing of top-k dominating queries on multi-dimensional data

  • J.J. Levandoski, M.F. Mokbel, M.E. Khalefa, Flexpref: A framework for extensible preference evaluation in database...
  • A. Vlachou, C. Doulkeridis, Y. Kotidis, M. Vazirgiannis, SKYPEER: efficient subspace skyline computation over...
  • A. Hadjali, O. Pivert, H. Prade, Possibilistic contextual skylines with incomplete preferences, in: Second...
  • M.E. Khalefa, M.F. Mokbel, J.J. Levandoski, Skyline query processing for incomplete data, in: Proceedings of the 24th...
  • PeiJ. et al.

    Probabilistic skylines on uncertain data

  • AlwanA.A. et al.

    Processing skyline queries in incomplete distributed databases

    J. Intell. Inf. Syst.

    (2017)
  • BalkeW. et al.

    Restricting skyline sizes using weak pareto dominance

    Inform. Forsch. Entwickl.

    (2007)
  • K. Abbaci, A. Hadjali, L. Lietard, D. Rocacher, A linguistic quantifier-based approach for skyline refinement, in:...
  • A. Hadjali, O. Pivert, H. Prade, On different types of fuzzy skylines, in: Foundations of Intelligent Systems - 19th...
  • M. Endres, W. Kießling, Skyline snippets, in: Flexible Query Answering Systems - 9th International Conference, FQAS...
  • E. Hüllermeier, I. Vladimirskiy, B. Prados-Suárez, E. Stauch, Supporting case-based retrieval by similarity skylines:...
  • X. Lin, Y. Yuan, Q. Zhang, Y. Zhang, Selecting stars: The k most representative skyline operator, in: Proceedings of...
  • C.Y. Chan, H.V. Jagadish, K. Tan, A.K.H. Tung, Z. Zhang, On high dimensional skylines, in: Advances in Database...
  • D. Papadias, Y. Tao, G. Fu, B. Seeger, An optimal and progressive algorithm for skyline queries, in: Proceedings of the...
  • M. Goncalves, L. Tineo, Fuzzy dominance skyline queries, in: Database and Expert Systems Applications, 18th...
  • W. Jin, J. Han, M. Ester, Mining thick skylines over large databases, in: Knowledge Discovery in Databases: PKDD 2004,...
  • Cited by (7)

    View all citing articles on Scopus
    View full text