skip to main content
10.1145/3335783.3335797acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Informative Summarization of Numeric Data

Published: 23 July 2019 Publication History

Abstract

We consider the following data summarization problem. We are given a dataset including ordinal or numeric explanatory attributes and an outcome attribute. We want to produce a summary of how the explanatory attributes affect the outcome attribute. The summary must be human-interpretable, concise, and informative in the sense that it can accurately approximate the distribution of the outcome attribute. We propose a solution that addresses the fundamental challenge of this problem--handling large numeric domains--and we experimentally show the effectiveness and efficiency of our approach on real datasets.

References

[1]
Martin Atzmueller. 2015. Subgroup discovery. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5, 1 (2015), 35--49.
[2]
Adam L Berger, Vincent J Della Pietra, and Stephen A Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational linguistics 22, 1 (1996), 39--71.
[3]
Ivan Bruha, Pavel Kralik, and Petr Berka. 2000. Genetic learner: Discretization and fuzzification of numerical attributes. Intelligent Data Analysis 4, 5 (2000), 445--460.
[4]
Luis M Candanedo and Véronique Feldheim. 2016. Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models. Energy and Buildings 112 (2016), 28--39.
[5]
Luis M Candanedo, Véronique Feldheim, and Dominique Deramaix. 2017. Data driven prediction models of energy use of appliances in a low-energy house. Energy and buildings 140 (2017), 81--97.
[6]
Paulo Cortez, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. 2009. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47, 4 (2009), 547--553.
[7]
Kareem El Gebaly, Parag Agrawal, Lukasz Golab, Flip Korn, and Divesh Srivastava. 2014. Interpretable and informative explanations of outcomes. Proceedings of the VLDB Endowment 8, 1 (2014), 61--72.
[8]
Kareem El Gebaly, Guoyao Feng, Lukasz Golab, Flip Korn, and Divesh Srivastava. 2018. Explanation Tables. IEEE Data Eng. Bull. 41, 3 (2018), 43--51.
[9]
Guoyao Feng, Lukasz Golab, and Divesh Srivastava. 2017. Scalable informative rule mining. In Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE, 437--448.
[10]
Salvador Garcia, Julian Luengo, José Antonio Sáez, Victoria Lopez, and Francisco Herrera. 2013. A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering 25, 4 (2013), 734--750.
[11]
Henrik Grosskreutz and Stefan Rüping. 2009. On subgroup discovery in numerical domains. Data mining and knowledge discovery 19, 2 (2009), 210--226.
[12]
Sumyea Helal. 2016. Subgroup discovery algorithms: a survey and empirical evaluation. Journal of Computer Science and Technology 31, 3 (2016), 561--576.
[13]
Franciso Herrera, Cristóbal José Carmona, Pedro González, and María José Del Jesus. 2011. An overview on subgroup discovery: foundations and applications. Knowledge and information systems 29, 3 (2011), 495--525.
[14]
Ching-Tien Ho, Rakesh Agrawal, Nimrod Megiddo, and Ramakrishnan Srikant. 1997. Range Queries in OLAP Data Cubes. SIGMOD Rec. 26, 2 (1997), 73--88.
[15]
Ramon Huerta, Thiago Mosqueiro, Jordi Fonollosa, Nikolai F Rulkov, and Irene Rodriguez-Lujan. 2016. Online decorrelation of humidity and temperature in chemical sensors for continuous monitoring. Chemometrics and Intelligent Laboratory Systems 157 (2016), 169--176.
[16]
Yiping Ke, James Cheng, and Wilfred Ng. 2008. An information-theoretic approach to quantitative association rule mining. Knowledge and Information Systems 16, 2 (2008), 213--244.
[17]
Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. 2016. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1675--1684.
[18]
Michael Mampaey, Nikolaj Tatti, and Jilles Vreeken. 2011. Tell me what i need to know: succinctly summarizing data with itemsets. In ACM SIGKDD international conference on Knowledge discovery and data mining. 573--581.
[19]
J. Ross Quinlan. 1987. Simplifying decision trees. International journal of man-machine studies 27, 3 (1987), 221--234.
[20]
J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc.
[21]
Salvatore Ruggieri. 2004. Yadt: Yet another decision tree builder. In Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on. 260--265.
[22]
Sunita Sarawagi. 2001. User-cognizant multidimensional analysis. The VLDB Journal 10, 2-3 (2001), 224--239.
[23]
Matthijs van Leeuwen and Arno Knobbe. 2012. Diverse subgroup set discovery. Data Mining and Knowledge Discovery 25, 2 (2012), 208--242.
[24]
Sebastián Ventura and José María Luna. 2018. Subgroup Discovery. In Supervised Descriptive Pattern Mining. Springer, 71--98.

Cited By

View all
  • (2024)Discovering approximate implicit domain orders through order dependenciesThe VLDB Journal10.1007/s00778-024-00847-y33:5(1257-1282)Online publication date: 21-May-2024
  • (2023)iORDER: Mining Implicit Domain Orders2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00283(3623-3626)Online publication date: Apr-2023
  • (2023)Cluster based similarity extraction upon distributed datasetsCluster Computing10.1007/s10586-023-04116-527:3(2917-2929)Online publication date: 25-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSDBM '19: Proceedings of the 31st International Conference on Scientific and Statistical Database Management
July 2019
244 pages
ISBN:9781450362160
DOI:10.1145/3335783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SSDBM '19

Acceptance Rates

Overall Acceptance Rate 56 of 146 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Discovering approximate implicit domain orders through order dependenciesThe VLDB Journal10.1007/s00778-024-00847-y33:5(1257-1282)Online publication date: 21-May-2024
  • (2023)iORDER: Mining Implicit Domain Orders2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00283(3623-3626)Online publication date: Apr-2023
  • (2023)Cluster based similarity extraction upon distributed datasetsCluster Computing10.1007/s10586-023-04116-527:3(2917-2929)Online publication date: 25-Aug-2023
  • (2022)Guided exploration of data summariesProceedings of the VLDB Endowment10.14778/3538598.353860315:9(1798-1807)Online publication date: 1-May-2022
  • (2022)Discovering Domain Orders via Order Dependencies2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00087(1098-1110)Online publication date: May-2022
  • (2021)Exploring Ratings in Subjective DatabasesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457259(62-75)Online publication date: 9-Jun-2021
  • (2021)Towards Scientific Data Synthesis Using Deep Learning and Semantic WebThe Semantic Web: ESWC 2021 Satellite Events10.1007/978-3-030-80418-3_10(54-59)Online publication date: 21-Jul-2021
  • (2021)A Review of Graph-Based Extractive Text Summarization ModelsInnovative Systems for Intelligent Health Informatics10.1007/978-3-030-70713-2_41(439-448)Online publication date: 6-May-2021
  • (2019)Energy Time-Series Features for Emerging Applications on the Basis of Human-Readable Machine DescriptionsProceedings of the Tenth ACM International Conference on Future Energy Systems10.1145/3307772.3331022(474-481)Online publication date: 15-Jun-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media