research-article

On "one of the few" objects

Authors:

Pankaj K. Agarwal,

Cong YuAuthors Info & Claims

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1487 - 1495

https://doi.org/10.1145/2339530.2339762

Published: 12 August 2012 Publication History

Abstract

Objects with multiple numeric attributes can be compared within any "subspace" (subset of attributes). In applications such as computational journalism, users are interested in claims of the form: Karl Malone is one of the only two players in NBA history with at least 25,000 points, 12,000 rebounds, and 5,000 assists in one's career. One challenge in identifying such "one-of-the-k" claims (k = 2 above) is ensuring their "interestingness". A small k is not a good indicator for interestingness, as one can often make such claims for many objects by increasing the dimensionality of the subspace considered. We propose a uniqueness-based interestingness measure for one-of-the-few claims that is intuitive for non-technical users, and we design algorithms for finding all interesting claims (across all subspaces) from a dataset. Sometimes, users are interested primarily in the objects appearing in these claims. Building on our notion of interesting claims, we propose a scheme for ranking objects and an algorithm for computing the top-ranked objects. Using real-world datasets, we evaluate the efficiency of our algorithms as well as the advantage of our object-ranking scheme over popular methods such as Kemeny optimal rank aggregation and weighted-sum ranking.

Supplementary Material

JPG File (311a_w_talk_8.jpg)

Download
8.82 KB

MP4 File (311a_w_talk_8.mp4)

Download
450.64 MB

References

[1]

J. Bentley, H. Kung, M. Schkolnick, and C. Thompson. On the average number of maxima in a set of vectors and applications. JACM, 25(4):536--543, 1978.

Digital Library

[2]

K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cube. ACM SIGMOD Record, 28(2):359--370, 1999.

Digital Library

[3]

S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, 2001.

Digital Library

[4]

J. Chomicki, P. Godfrey, J. Gryz, and D. Liang. Skyline with presorting. In ICDE, 2003.

[5]

S. Cohen, J. T. Hamilton, and F. Turner. Computational journalism. Communications of the ACM, 54(10):66--71, 2011.

Digital Library

[6]

S. Cohen, C. Li, J. Yang, and C. Yu. Computational journalism: A call to arms to database researchers. In CIDR, 2011.

[7]

C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW, 2001.

Digital Library

[8]

P. Godfrey, R. Shipley, and J. Gryz. Maximal vector computation in large data sets. In VLDB, 2005.

Digital Library

[9]

X. Jiang, C. Li, P. Luo, M. Wang, and Y. Yu. Prominent streak discovery in sequence data. In KDD, pages 1280--1288, 2011.

Digital Library

[10]

B. Kavaek, N. Lavrač, and V. Jovanoski. Apriori-sd: Adapting association rule learning to subgroup discovery. Advances in Intelligent Data Analysis V, pages 230--241, 2003.

[11]

D. Kossmann, F. Ramsak, and S. Rost. Shooting stars in the sky: An online algorithm for skyline queries. In VLDB, 2002.

Digital Library

[12]

N. Lavrač, F. }elezný, and P. Flach. Rsd: Relational subgroup discovery through first-order feature construction. Inductive Logic Programming, pages 149--165, 2003.

Digital Library

[13]

X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang. Selecting stars: The k most representative skyline operator. In ICDE, 2007.

[14]

H. Lu, C. Jensen, and Z. Zhang. Skyline ordering: A flexible framework for efficient resolution of size constraints on skyline queries. Technical report, Aalborg University, 2010.

[15]

D. Papadias, Y. Tao, G. Fu, and B. Seeger. Progressive skyline computation in database systems. TODS, 30(1):41--82, 2005.

Digital Library

[16]

L. Parida and N. Ramakrishnan. Redescription mining: Structure theory and algorithms. AAAI, 20(2):837, 2005.

Digital Library

[17]

J. Pei, Y. Yuan, X. Lin, W. Jin, M. Ester, Q. Liu, W.Wang, Y. Tao, J. Yu, and Q. Zhang. Towards multidimensional subspace skyline analysis. TODS, 31(4):1335--1381, 2006.

Digital Library

[18]

K. Tan, P. Eng, B. Ooi, et al. Efficient progressive skyline computation. In VLDB, 2001.

Digital Library

[19]

Y. Tao, X. Xiao, and J. Pei. Subsky: Efficient computation of skylines in subspaces. In ICDE, 2006.

Digital Library

[20]

Y. Tian, G. Weiss, D. Hsu, and Q. Ma. A combinatorial fusion method for feature mining. In KDD, volume 7, 2007.

[21]

P. Tsaparas, T. Palpanas, Y. Kotidis, N. Koudas, and D. Srivastava. Ranked join indices. In ICDE, 2003.

[22]

A. Vlachou, C. Doulkeridis, K. Nørvåg, and M. Vazirgiannis. Skyline-based peer-to-peer top-k query processing. In ICDE, 2008.

Digital Library

[23]

Y. Wu, P. K. Agarwal, C. Li, J. Yang, and C. Yu. On "one of the few" objects. Technical report, Duke University, Feb. 2012. http://www.cs.duke.edu/dbgroup/papers/2012-WuEtAl-oneoffew.pdf.

[24]

M. Yiu and N. Mamoulis. Efficient processing of top-k dominating queries on multi-dimensional data. In VLDB, 2007.

Digital Library

[25]

H. Young. Social choice scoring functions. SIAM Journal on Applied Mathematics, pages 824--838, 1975.

[26]

S. Zhang and M. Zaki. Mining multiple data sources: local pattern analysis. KDD, 12(2):121--125, 2006.

Digital Library

Cited By

Niu YLi YKarras PWang YLi Z(2024)Discovering Personalized Characteristic Communities in Attributed Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00221(2834-2847)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00221
Trummer IDas SPandis ISelçuk Candan KAmer-Yahia S(2023)Demonstrating NaturalMiner: Searching Large Data Sets for Abstract Patterns Described in Natural LanguageCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589694(139-142)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589694
Trummer I(2022)BABOONSProceedings of the VLDB Endowment10.14778/3551793.355184615:11(2980-2993)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551846
Show More Cited By

Index Terms

On "one of the few" objects

Recommendations

Detecting Check-worthy Factual Claims in Presidential Debates
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Public figures such as politicians make claims about "facts" all the time. Journalists and citizens spend a good amount of time checking the veracity of such claims. Toward automatic fact checking, we developed tools to find check-worthy factual claims ...
Computational Fact Checking through Query Perturbations
Invited Paper from ICDT 2014, Invited Paper from EDBT 2015, Regular Papers and Technical Correspondence

Our media is saturated with claims of “facts” made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, for example, is a claim “...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2012

1616 pages

ISBN:9781450314626

DOI:10.1145/2339530

General Chair:
Qiang Yang
Hong Kong University of Science and Technology
,
Program Chairs:
Deepak Agarwal
LinkedIn
,
Jian Pei
Simon Fraser University

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '12

Sponsor:

KDD '12: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 12 - 16, 2012

Beijing, China

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
402
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Niu YLi YKarras PWang YLi Z(2024)Discovering Personalized Characteristic Communities in Attributed Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00221(2834-2847)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00221
Trummer IDas SPandis ISelçuk Candan KAmer-Yahia S(2023)Demonstrating NaturalMiner: Searching Large Data Sets for Abstract Patterns Described in Natural LanguageCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589694(139-142)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589694
Trummer I(2022)BABOONSProceedings of the VLDB Endowment10.14778/3551793.355184615:11(2980-2993)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551846
Yang YLi YKarras PTung AZhu FChin Ooi BMiao CWang HSkrypnyk IHsu WChawla S(2021)Context-aware Outstanding Fact Mining from Knowledge GraphsProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467272(2006-2016)Online publication date: 14-Aug-2021
https://dl.acm.org/doi/10.1145/3447548.3467272
Bai RHon WLo EHe ZZhu K(2019)Historic Moments Discovery in Sequence DataACM Transactions on Database Systems10.1145/327697544:1(1-33)Online publication date: 29-Jan-2019
https://dl.acm.org/doi/10.1145/3276975
Zhang GJimenez DLi CDas GJermaine CBernstein P(2018)MaverickProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3183730(1317-1332)Online publication date: 27-May-2018
https://dl.acm.org/doi/10.1145/3183713.3183730
Wang WTang BZhu M(2018)Efficient Longest Streak Discovery in Multidimensional Sequence DataWeb and Big Data10.1007/978-3-319-96893-3_13(166-181)Online publication date: 19-Jul-2018
https://doi.org/10.1007/978-3-319-96893-3_13
Wu YAgarwal PLi CYang JYu C(2017)Computational Fact Checking through Query PerturbationsACM Transactions on Database Systems10.1145/299645342:1(1-41)Online publication date: 9-Jan-2017
https://dl.acm.org/doi/10.1145/2996453
Fan QLi YZhang DTan K(2017)Discovering Newsworthy Themes from Sequenced Data: A Step Towards Computational JournalismIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.268558729:7(1398-1411)Online publication date: 1-Jul-2017
https://doi.org/10.1109/TKDE.2017.2685587
Walenz BYang J(2016)Perturbation analysis of database queriesProceedings of the VLDB Endowment10.14778/3007328.30073309:14(1635-1646)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.14778/3007328.3007330
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents