skip to main content
10.1145/2452376.2452424acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

SkyDiver: a framework for skyline diversification

Published: 18 March 2013 Publication History

Abstract

Skyline queries have attracted considerable attention by the database community during the last decade, due to their applicability in a series of domains. However, most existing works tackle the problem from an efficiency standpoint, i.e., returning the skyline as quickly as possible. The user is then presented with the entire skyline set, which may be in several cases overwhelming, therefore requiring manual inspection to come up with the most informative data points. To overcome this shortcoming, we propose a novel approach in selecting the k most diverse skyline points, i.e., the ones that best capture the different aspects of both the skyline and the dataset they belong to. We present a novel formulation of diversification which, in contrast to previous proposals, is intuitive, because it is based solely on the domination relationships among points. Consequently, additional artificial distance measures (e.g., Lp norms) among skyline points are not required. We present efficient approaches in solving this problem and demonstrate the efficiency and effectiveness of our approach through an extensive experimental evaluation with both real-life and synthetic data sets.

References

[1]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proc. WSDM '09, pages 5--14, 2009.
[2]
A. Angel and N. Koudas. Efficient diversity-aware search. In Proc. SIGMOD '11, pages 781--792, 2011.
[3]
J. L. Bentley, H. T. Kung, M. Schkolnick, and C. D. Thompson. On the average number of maxima in a set of vectors and applications. J. ACM, 25(4):536--543, 1978.
[4]
S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In Proc. ICDE '01, pages 421--430, 2001.
[5]
B. Boyce. Beyond topicality: A two stage view of relevance and the retrieval process. Information Processing & Management, 18(3):105--109, 1982.
[6]
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. J. Computer System Science, 60(3):630--659, 2000.
[7]
H. Brönnimann and M. T. Goodrich. Almost optimal set covers in finite vc-dimension: (preliminary version). In Proc. SCG '94, pages 293--302, 1994.
[8]
C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proc. SIGIR '08, pages 659--666, 2008.
[9]
E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang. Finding interesting associations without support pruning. IEEE Trans. on Knowl. and Data Eng., 13(1):64--78, 2001.
[10]
A. Das Sarma, A. Lall, D. Nanongkai, R. J. Lipton, and J. Xu. Representative skylines using threshold-based preference distributions. In Proc. ICDE '11, pages 387--398, 2011.
[11]
A. Das Sarma, A. Lall, D. Nanongkai, and J. Xu. Randomized multi-pass streaming skyline algorithms. PVLDB, 2(1):85--96, 2009.
[12]
M. Datar and S. Muthukrishnan. Estimating rarity and similarity over data stream windows. In Proc. ESA '02, pages 323--334, 2002.
[13]
M. Drosou and E. Pitoura. Dynamic diversification of continuous data. In Proc. EDBT '12, pages 216--227, 2012.
[14]
E. Erkut, Y. Ülküsal, and O. Yenicerioğlu. A comparison of p-dispersion heuristics. Comput. Oper. Res., 21(10):1103--1113, 1994.
[15]
Y. Gao, J. Hu, G. Chen, and C. Chen. Finding the most desirable skyline objects. In Proc. DASFAA '10, pages 116--122, 2010.
[16]
J. B. Ghosh. Computational aspects of the maximum diversity problem. Oper. Res. Lett., 19(4):175--181, 1996.
[17]
J. R. Haritsa. The kndn problem: A quest for unity in diversity. IEEE Data Eng. Bull., 32(4):15--22, 2009.
[18]
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proc. STOC '98, pages 604--613, 1998.
[19]
C.-C. Kuo, F. Glover, and K. S. Dhir. Analyzing and modeling the maximum diversity problem by zero-one programming. Decision Sciences, 24(6):1171--1185, 1993.
[20]
T. Lappas, G. Valkanas, and D. Gunopulos. Efficient and domain-invariant competitor mining. In Proc. SIGKDD '12, pages 408--416, 2012.
[21]
X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang. Selecting stars: The k most representative skyline operator. In Proc. ICDE '07, pages 86--95, 2007.
[22]
Z. Liu, P. Sun, and Y. Chen. Structured search result differentiation. PVLDB, 2(1):313--324, 2009.
[23]
W. Maass. Efficient agnostic pac-learning with simple hypothesis. In Proc. COLT '94, pages 67--75, 1994.
[24]
D. Papadias, Y. Tao, G. Fu, and B. Seeger. Progressive skyline computation in database systems. ACM Trans. Database Syst., 30(1):41--82, 2005.
[25]
A. N. Papadopoulos, A. Lyritsis, and Y. Manolopoulos. Skygraph: an algorithm for important subgraph discovery in relational graphs. Data Min. Knowl. Discov., 17(1):57--76, 2008.
[26]
D. Pisinger. Upper bounds and exact algorithms for p-dispersion problems. Comput. Oper. Res., 33(5):1380--1398, 2006.
[27]
A. Rajaraman and J. Ullman. Mining of Massive Datasets. Cambridge University Press, 2011.
[28]
S. S. Ravi, D. J. Rosenkrantz, and G. K. Tavyi. Heuristic and special case algorithms for dispersion problema. Operations Research, 42(2):299--310, 1994.
[29]
C. Sheng and Y. Tao. On finding skylines in external memory. In Proc. PODS '11, pages 107--116, 2011.
[30]
K. Spärck-Jones, S. E. Robertson, and M. Sanderson. Ambiguous requests: implications for retrieval tests, systems and theories. SIGIR Forum, 41(2):8--17, 2007.
[31]
J. Stoyanovich, W. Mee, and K. A. Ross. Semantic ranking and result visualization for life sciences publications. In Proc. ICDE '10, pages 860--871, 2010.
[32]
Y. Tao, L. Ding, X. Lin, and J. Pei. Distance-based representative skyline. In Proc. ICDE '09, pages 892--903, 2009.
[33]
V. Vapnik. Statistical learning theory. Wiley, 1998.
[34]
M. R. Vieira, H. L. Razente, M. C. N. Barioni, M. Hadjieleftheriou, D. Srivastava, C. T. Jr., and V. J. Tsotras. Divdb: A system for diversifying query results. PVLDB, 4(12):1395--1398, 2011.
[35]
A. C.-C. Yao. Probabilistic computations: Toward a unified measure of complexity. In Proc. FOCS '77, pages 222--227, 1977.
[36]
M. L. Yiu and N. Mamoulis. Efficient processing of top-k dominating queries on multi-dimensional data. In Proc. VLDB '07, pages 483--494, 2007.
[37]
S. Zhang, N. Mamoulis, D. W. Cheung, and B. Kao. Efficient skyline evaluation over partially ordered domains. PVLDB, 3(1-2):1255--1266, 2010.
[38]
H. Zhenhua, X. Yang, and L. Ziyu. l-skydiv query: Effectively improve the usefulness of skylines. SCIENCE CHINA Information Sciences, 53(9):1785--1799, 2010.

Cited By

View all
  • (2023)Continuous $k$-Regret Minimization Queries: A Dynamic Coreset ApproachIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.316683535:6(5680-5694)Online publication date: 1-Jun-2023
  • (2023) Polynomial algorithms for p -dispersion problems in a planar Pareto Front RAIRO - Operations Research10.1051/ro/202303457:2(857-880)Online publication date: 28-Apr-2023
  • (2019)A unified optimization algorithm for solving "regret-minimizing representative" problemsProceedings of the VLDB Endowment10.14778/3368289.336829113:3(239-251)Online publication date: 1-Nov-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '13: Proceedings of the 16th International Conference on Extending Database Technology
March 2013
793 pages
ISBN:9781450315975
DOI:10.1145/2452376
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 March 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MinHashing
  2. approximation
  3. diversity
  4. skyline

Qualifiers

  • Research-article

Funding Sources

  • EU and Greek National funds

Conference

EDBT/ICDT '13

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Continuous $k$-Regret Minimization Queries: A Dynamic Coreset ApproachIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.316683535:6(5680-5694)Online publication date: 1-Jun-2023
  • (2023) Polynomial algorithms for p -dispersion problems in a planar Pareto Front RAIRO - Operations Research10.1051/ro/202303457:2(857-880)Online publication date: 28-Apr-2023
  • (2019)A unified optimization algorithm for solving "regret-minimizing representative" problemsProceedings of the VLDB Endowment10.14778/3368289.336829113:3(239-251)Online publication date: 1-Nov-2019
  • (2018)k-Skyband query answering with differential privacy1Journal of Computer Security10.3233/JCS-17110126:5(647-676)Online publication date: 9-Aug-2018
  • (2018)SkyLens: Visual Analysis of Skyline on Multi-Dimensional DataIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2017.274473824:1(246-255)Online publication date: Jan-2018
  • (2018)Diverse nearest neighbors queries using linear skylinesGeoinformatica10.1007/s10707-018-0332-722:4(815-844)Online publication date: 1-Oct-2018
  • (2017)Towards Spatially- and Category-Wise k-Diverse Nearest Neighbors QueriesAdvances in Spatial and Temporal Databases10.1007/978-3-319-64367-0_9(163-181)Online publication date: 22-Jul-2017
  • (2017)Differentially Private K-Skyband Query Answering Through Adaptive Spatial DecompositionData and Applications Security and Privacy XXXI10.1007/978-3-319-61176-1_8(142-163)Online publication date: 22-Jun-2017
  • (2015)Multiple Radii DisC DiversityACM Transactions on Database Systems10.1145/269949940:1(1-43)Online publication date: 25-Mar-2015
  • (2015)Incremental evaluation of top-k combinatorial metric skyline queryKnowledge-Based Systems10.1016/j.knosys.2014.11.00974:1(89-105)Online publication date: 1-Jan-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media