skip to main content
10.1145/2501511.2501515acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Building blocks for exploratory data analysis tools

Published: 11 August 2013 Publication History

Abstract

Data exploration is largely manual and labor intensive. Although there are various tools and statistical techniques that can be applied to data sets, there is little help to identify what questions to ask of a data set, let alone what domain knowledge is useful in answering the questions. In this paper, we study user queries against production data sets in Splunk. Specifically, we characterize the interplay between data sets and the operations used to analyze them using latent semantic analysis, and discuss how this characterization serves as a building block for a data analysis recommendation system. This is a work-in-progress paper.

References

[1]
Tableau software. www.tableausoftware.com.
[2]
Abraham Bernstein, Foster Provost, and Shawndra Hill. Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification. Knowledge and Data Engineering, IEEE Transactions on, 17(4):503--518, 2005.
[3]
Ledion Bitincka, Archana Ganapathi, Stephen Sorkin, and Steve Zhang. Optimizing data analysis with a semi-structured time series database. In SLAML, 2010.
[4]
Stephen M Casner. Task-analytic approach to the automated design of graphic presentations. ACM Transactions on Graphics (TOG), 10(2):111--151, 1991.
[5]
Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. Recommender Systems: An Introduction. Cambridge University Press, 2011.
[6]
Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. Wrangler: Interactive visual specification of data transformation scripts. In ACM Human Factors in Computing Systems (CHI), 2011.
[7]
Sean Kandel, Ravi Parikh, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. Profiler: Integrated statistical analysis and visualization for data quality assessment. In Advanced Visual Interfaces (AVI), 2012.
[8]
Alicia Key, Bill Howe, Daniel Perry, and Cecilia Aragon. Vizdeck: self-organizing dashboards for visual analytics. In ACM International Conference on Management of Data (SIGMOD), 2012.
[9]
Jock Mackinlay. Automating the design of graphical presentations of relational information. ACM Transactions on Graphics (TOG), 5(2):110--141, 1986.
[10]
Christopher Manning, Prabhakar Raghavan, and Hinrich Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[11]
Steven F Roth, John Kolojejchick, Joe Mattis, and Jade Goldstein. Interactive graphic design using automatic presentation knowledge. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 112--117. ACM, 1994.
[12]
Michael Schiff et al. Designing graphic presentations from first principles. University of California, Berkeley, 1998.
[13]
Robert St. Amant and Paul R Cohen. Intelligent support for exploratory data analysis. Journal of Computational and Graphical Statistics, 7(4):545--558, 1998.
[14]
Chris Stolte, Diane Tang, and Pat Hanrahan. Polaris: A system for query, analysis, and visualization of multidimensional relational databases. Visualization and Computer Graphics, IEEE Transactions on, 8(1):52--65, 2002.
[15]
John Tukey. Exploratory data analysis. Addison-Wesley, 1977.
[16]
Kiri Wagstaff, Nina Lanza, David Thompson, Thomas Dietterich, and Martha Gilmore. Guiding scientific discovery with explanations using DEMUD. In Conference on Artificial Intelligence (AAAI), 2013.

Cited By

View all
  • (2022)Detecting Malicious Domains using the Splunk Machine Learning ToolkitNOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium10.1109/NOMS54207.2022.9789899(1-6)Online publication date: 25-Apr-2022
  • (2016)Query-Biased Summaries for Tabular DataProceedings of the 21st Australasian Document Computing Symposium10.1145/3015022.3015027(69-72)Online publication date: 5-Dec-2016
  • (2014)Analyzing log analysisProceedings of the 28th USENIX conference on Large Installation System Administration10.5555/2717491.2717495(53-68)Online publication date: 9-Nov-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IDEA '13: Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
August 2013
104 pages
ISBN:9781450323291
DOI:10.1145/2501511
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

KDD' 13
Sponsor:

Acceptance Rates

IDEA '13 Paper Acceptance Rate 11 of 25 submissions, 44%;
Overall Acceptance Rate 11 of 25 submissions, 44%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Detecting Malicious Domains using the Splunk Machine Learning ToolkitNOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium10.1109/NOMS54207.2022.9789899(1-6)Online publication date: 25-Apr-2022
  • (2016)Query-Biased Summaries for Tabular DataProceedings of the 21st Australasian Document Computing Symposium10.1145/3015022.3015027(69-72)Online publication date: 5-Dec-2016
  • (2014)Analyzing log analysisProceedings of the 28th USENIX conference on Large Installation System Administration10.5555/2717491.2717495(53-68)Online publication date: 9-Nov-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media