skip to main content
10.1145/2339530.2339762acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

On "one of the few" objects

Published: 12 August 2012 Publication History

Abstract

Objects with multiple numeric attributes can be compared within any "subspace" (subset of attributes). In applications such as computational journalism, users are interested in claims of the form: Karl Malone is one of the only two players in NBA history with at least 25,000 points, 12,000 rebounds, and 5,000 assists in one's career. One challenge in identifying such "one-of-the-k" claims (k = 2 above) is ensuring their "interestingness". A small k is not a good indicator for interestingness, as one can often make such claims for many objects by increasing the dimensionality of the subspace considered. We propose a uniqueness-based interestingness measure for one-of-the-few claims that is intuitive for non-technical users, and we design algorithms for finding all interesting claims (across all subspaces) from a dataset. Sometimes, users are interested primarily in the objects appearing in these claims. Building on our notion of interesting claims, we propose a scheme for ranking objects and an algorithm for computing the top-ranked objects. Using real-world datasets, we evaluate the efficiency of our algorithms as well as the advantage of our object-ranking scheme over popular methods such as Kemeny optimal rank aggregation and weighted-sum ranking.

Supplementary Material

JPG File (311a_w_talk_8.jpg)
MP4 File (311a_w_talk_8.mp4)

References

[1]
J. Bentley, H. Kung, M. Schkolnick, and C. Thompson. On the average number of maxima in a set of vectors and applications. JACM, 25(4):536--543, 1978.
[2]
K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cube. ACM SIGMOD Record, 28(2):359--370, 1999.
[3]
S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, 2001.
[4]
J. Chomicki, P. Godfrey, J. Gryz, and D. Liang. Skyline with presorting. In ICDE, 2003.
[5]
S. Cohen, J. T. Hamilton, and F. Turner. Computational journalism. Communications of the ACM, 54(10):66--71, 2011.
[6]
S. Cohen, C. Li, J. Yang, and C. Yu. Computational journalism: A call to arms to database researchers. In CIDR, 2011.
[7]
C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW, 2001.
[8]
P. Godfrey, R. Shipley, and J. Gryz. Maximal vector computation in large data sets. In VLDB, 2005.
[9]
X. Jiang, C. Li, P. Luo, M. Wang, and Y. Yu. Prominent streak discovery in sequence data. In KDD, pages 1280--1288, 2011.
[10]
B. Kavaek, N. Lavrač, and V. Jovanoski. Apriori-sd: Adapting association rule learning to subgroup discovery. Advances in Intelligent Data Analysis V, pages 230--241, 2003.
[11]
D. Kossmann, F. Ramsak, and S. Rost. Shooting stars in the sky: An online algorithm for skyline queries. In VLDB, 2002.
[12]
N. Lavrač, F. }elezný, and P. Flach. Rsd: Relational subgroup discovery through first-order feature construction. Inductive Logic Programming, pages 149--165, 2003.
[13]
X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang. Selecting stars: The k most representative skyline operator. In ICDE, 2007.
[14]
H. Lu, C. Jensen, and Z. Zhang. Skyline ordering: A flexible framework for efficient resolution of size constraints on skyline queries. Technical report, Aalborg University, 2010.
[15]
D. Papadias, Y. Tao, G. Fu, and B. Seeger. Progressive skyline computation in database systems. TODS, 30(1):41--82, 2005.
[16]
L. Parida and N. Ramakrishnan. Redescription mining: Structure theory and algorithms. AAAI, 20(2):837, 2005.
[17]
J. Pei, Y. Yuan, X. Lin, W. Jin, M. Ester, Q. Liu, W.Wang, Y. Tao, J. Yu, and Q. Zhang. Towards multidimensional subspace skyline analysis. TODS, 31(4):1335--1381, 2006.
[18]
K. Tan, P. Eng, B. Ooi, et al. Efficient progressive skyline computation. In VLDB, 2001.
[19]
Y. Tao, X. Xiao, and J. Pei. Subsky: Efficient computation of skylines in subspaces. In ICDE, 2006.
[20]
Y. Tian, G. Weiss, D. Hsu, and Q. Ma. A combinatorial fusion method for feature mining. In KDD, volume 7, 2007.
[21]
P. Tsaparas, T. Palpanas, Y. Kotidis, N. Koudas, and D. Srivastava. Ranked join indices. In ICDE, 2003.
[22]
A. Vlachou, C. Doulkeridis, K. Nørvåg, and M. Vazirgiannis. Skyline-based peer-to-peer top-k query processing. In ICDE, 2008.
[23]
Y. Wu, P. K. Agarwal, C. Li, J. Yang, and C. Yu. On "one of the few" objects. Technical report, Duke University, Feb. 2012. http://www.cs.duke.edu/dbgroup/papers/2012-WuEtAl-oneoffew.pdf.
[24]
M. Yiu and N. Mamoulis. Efficient processing of top-k dominating queries on multi-dimensional data. In VLDB, 2007.
[25]
H. Young. Social choice scoring functions. SIAM Journal on Applied Mathematics, pages 824--838, 1975.
[26]
S. Zhang and M. Zaki. Mining multiple data sources: local pattern analysis. KDD, 12(2):121--125, 2006.

Cited By

View all
  • (2024)Discovering Personalized Characteristic Communities in Attributed Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00221(2834-2847)Online publication date: 13-May-2024
  • (2023)Demonstrating NaturalMiner: Searching Large Data Sets for Abstract Patterns Described in Natural LanguageCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589694(139-142)Online publication date: 4-Jun-2023
  • (2022)BABOONSProceedings of the VLDB Endowment10.14778/3551793.355184615:11(2980-2993)Online publication date: 29-Sep-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2012
1616 pages
ISBN:9781450314626
DOI:10.1145/2339530
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. computational journalism
  2. fact checking
  3. ranking
  4. skyband

Qualifiers

  • Research-article

Conference

KDD '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Discovering Personalized Characteristic Communities in Attributed Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00221(2834-2847)Online publication date: 13-May-2024
  • (2023)Demonstrating NaturalMiner: Searching Large Data Sets for Abstract Patterns Described in Natural LanguageCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589694(139-142)Online publication date: 4-Jun-2023
  • (2022)BABOONSProceedings of the VLDB Endowment10.14778/3551793.355184615:11(2980-2993)Online publication date: 29-Sep-2022
  • (2021)Context-aware Outstanding Fact Mining from Knowledge GraphsProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467272(2006-2016)Online publication date: 14-Aug-2021
  • (2019)Historic Moments Discovery in Sequence DataACM Transactions on Database Systems10.1145/327697544:1(1-33)Online publication date: 29-Jan-2019
  • (2018)MaverickProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3183730(1317-1332)Online publication date: 27-May-2018
  • (2018)Efficient Longest Streak Discovery in Multidimensional Sequence DataWeb and Big Data10.1007/978-3-319-96893-3_13(166-181)Online publication date: 19-Jul-2018
  • (2017)Computational Fact Checking through Query PerturbationsACM Transactions on Database Systems10.1145/299645342:1(1-41)Online publication date: 9-Jan-2017
  • (2017)Discovering Newsworthy Themes from Sequenced Data: A Step Towards Computational JournalismIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.268558729:7(1398-1411)Online publication date: 1-Jul-2017
  • (2016)Perturbation analysis of database queriesProceedings of the VLDB Endowment10.14778/3007328.30073309:14(1635-1646)Online publication date: 1-Oct-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media