skip to main content
10.1145/2702613.2732933acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
Work in Progress

Quarry: Picking From Examples to Explore Big Data

Published: 18 April 2015 Publication History

Abstract

Analysts use scripts, visualization tools, and spreadsheets as they process and understand data. We focus on two phases of analysts' work: discovery, where the field definitions are understood, and profiling, where assumptions are tested by searching, observing, and running counts on data. Lack of data exploration and understanding can lead to faulty assumptions and misinterpretation. In practice, analysts use SQL queries and scripts to subset big data, reducing it for visualization or spreadsheet pivots. However, due to large-size and high-dimensional data, it is challenging to determine precise subsets of interest without thorough data exploration and discovery. We reduce the cost of previewing subsets by combining search with an information rich visualization of high-dimensional data. To enable discovery and profiling, Quarry supports (1) rapid query generation and visualized search; and (2) defining and previewing subsets of data for potential export for further processing. This work presents the design of Quarry and results from a formative study involving 11 analysts/data scientists and a dataset with 80 columns and 15 million rows.

References

[1]
Apache hive. https://hive.apache.org/.
[2]
Elasticsearch. http://www.elasticsearch.org/.
[3]
The r project for statistical computing. http://www.r-project.org/.
[4]
Tableau software. http://www.tableausoftware.com/.
[5]
Hearst, M. Search user interfaces. Cambridge University Press, 2009.
[6]
Kandel, S., Paepcke, A., Hellerstein, J. M., and Heer, J. Enterprise data analysis and visualization: An interview study. Visualization and Computer Graphics, IEEE Transactions on 18, 12 (2012), 2917--2926.
[7]
schraefel m.c., Wilson, M., Russell, A., and Smith, D. A. mspace: Improving information access to multimedia domains with multimodal exploratory search. Commun. ACM 49, 4 (Apr. 2006), 47--49.
[8]
Shneiderman, B. The eyes have it: A task by data type taxonomy for information visualizations. In Visual Languages, 1996. Proceedings., IEEE Symposium on, IEEE (1996), 336--343.
[9]
Stearley, J., Corwell, S., and Lord, K. Bridging the gaps: joining information sources with splunk. In Proceedings of the 2010 workshop on Managing systems via log analysis and machine learning techniques, USENIX Association (2010), 8--8.
[10]
Webb, A., and Kerne, A. The in-context slider: a fluid interface component for visualization and adjustment of values while authoring. In Proc working conference on Advanced visual interfaces, ACM (2008), 91--99.
[11]
White, R. W., and Roth, R. A. Exploratory search: Beyond the query-response paradigm. Synthesis Lectures on Information Concepts, Retrieval, and Services 1, 1 (2009), 1--98.
[12]
Wilson, M. L., Kules, B., Shneiderman, B., et al. From keyword search to exploration: Designing future search interfaces for the web. Foundations and Trends in Web Science 2, 1 (2010), 1--97.

Index Terms

  1. Quarry: Picking From Examples to Explore Big Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI EA '15: Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems
    April 2015
    2546 pages
    ISBN:9781450331463
    DOI:10.1145/2702613
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 April 2015

    Check for updates

    Author Tags

    1. big data
    2. query generation
    3. search
    4. visualization

    Qualifiers

    • Work in progress

    Conference

    CHI '15
    Sponsor:
    CHI '15: CHI Conference on Human Factors in Computing Systems
    April 18 - 23, 2015
    Seoul, Republic of Korea

    Acceptance Rates

    CHI EA '15 Paper Acceptance Rate 379 of 1,520 submissions, 25%;
    Overall Acceptance Rate 6,164 of 23,696 submissions, 26%

    Upcoming Conference

    CHI 2025
    ACM CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 179
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media