skip to main content
10.1145/3035918.3035922acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Extracting Top-K Insights from Multi-dimensional Data

Authors Info & Claims
Published:09 May 2017Publication History

ABSTRACT

OLAP tools have been extensively used by enterprises to make better and faster decisions. Nevertheless, they require users to specify group-by attributes and know precisely what they are looking for. This paper takes the first attempt towards automatically extracting top-k insights from multi-dimensional data. This is useful not only for non-expert users, but also reduces the manual effort of data analysts. In particular, we propose the concept of insight which captures interesting observation derived from aggregation results in multiple steps (e.g., rank by a dimension, compute the percentage of measure by a dimension). An example insight is: ``Brand B's rank (across brands) falls along the year, in terms of the increase in sales''. Our problem is to compute the top-k insights by a score function. It poses challenges on (i) the effectiveness of the result and (ii) the efficiency of computation. We propose a meaningful scoring function for insights to address (i). Then, we contribute a computation framework for top-k insights, together with a suite of optimization techniques (i.e., pruning, ordering, specialized cube, and computation sharing) to address (ii). Our experimental study on both real data and synthetic data verifies the effectiveness and efficiency of our proposed solution.

References

  1. Ibm cogons. https://goo.gl/6dYxLc.Google ScholarGoogle Scholar
  2. Ibm watson analytics. http://goo.gl/EK1nNU.Google ScholarGoogle Scholar
  3. Quick insights in microsoft power bi. https://goo.gl/xHwCLg.Google ScholarGoogle Scholar
  4. C. C. Aggarwal and P. S. Yu. Outlier detection for high dimensional data. In ACM Sigmod Record, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Anderson. The long tail: Why the future of business is selling more for less. Hyperion, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Balakrishnan. Handbook of the logistic distribution. CRC Press, 2013.Google ScholarGoogle Scholar
  7. C. Balkesen, G. Alonso, J. Teubner, and M. T. Özsu. Multi-core, main-memory joins: Sort vs. hash revisited. PVLDB, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cube. In SIGMOD Record, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chaudhuri. What next?: a half-dozen data management research goals for big data and the cloud. In PODS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Chaudhuri, U. Dayal, and V. Narasayya. An overview of business intelligence technology. Communications of the ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Dash, J. Rao, N. Megiddo, A. Ailamaki, and G. Lohman. Dynamic faceted search for discovery-driven analysis. In CIKM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Dimitriadou, O. Papaemmanouil, and Y. Diao. Explore-by-example: An automatic query steering framework for interactive data exploration. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. In VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Idreos, O. Papaemmanouil, and S. Chaudhuri. Overview of data exploration techniques. In SIGMOD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-phk query processing techniques in relational database systems. ACM Comput. Surv., 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Krzywinski and N. Altman. Points of significance: Significance, p values and t-tests. Nature methods, 2013.Google ScholarGoogle Scholar
  18. C. Li, B. C. Ooi, A. K. Tung, and S. Wang. Dada: a data cube for dominant relationship analysis. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Li, J. Han, Z. Yin, J.-G. Lee, and Y. Sun. Sampling cube: a framework for statistical olap over sampling data. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, and G. Carlsson. Extracting insights from the shape of complex data using topology. Scientific Reports, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. S. Michalski. A theory and methodology of inductive learning. In Machine learning. Springer, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  22. E. A. Müller. Efficient knowledge discovery in subspaces of high dimensional databases. PhD thesis, RWTH Aachen University, 2010.Google ScholarGoogle Scholar
  23. S. Sarawagi. Explaining differences in multidimensional aggregates. In VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Sarawagi. User-adaptive exploration of multidimensional data. In VLDB, 2000.Google ScholarGoogle Scholar
  25. S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of olap data cubes. In EDBT, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Sarawagi and G. Sathe. i3: intelligent, interactive investigation of olap data cubes. In ACM SIGMOD Record, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Sellam and M. L. Kersten. Meet charles, big data query advisor. In CIDR, 2013.Google ScholarGoogle Scholar
  28. T. Sellam, E. Müller, and M. L. Kersten. Semi-automated exploration of data warehouses. In CIKM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. H. Shumway and D. S. Stoffer. Time series analysis and its applications: with R examples. Springer Science & Business Media, 2010.Google ScholarGoogle Scholar
  30. M. Vartak, S. Rahman, S. Madden, A. Parameswaran, and N. Polyzotis. SEEDB: efficient data-driven visualization recommendations to support visual analytics. PVLDB, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Wasay, M. Athanassoulis, and S. Idreos. Queriosity: Automated data exploration. In IEEE Congress on Big Data, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Wu, Y. Sismanis, and B. Reinwald. Towards keyword-driven analytical processing. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Wu, D. Xin, and J. Han. Arcube: supporting ranking aggregate queries in partially materialized data cubes. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Wu, D. Xin, Q. Mei, and J. Han. Promotion analysis in multi-dimensional space. PVLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Xin, J. Han, X. Li, Z. Shao, and B. W. Wah. Computing iceberg cubes by top-down and bottom-up integration: The starcubing approach. TKDE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Extracting Top-K Insights from Multi-dimensional Data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
        May 2017
        1810 pages
        ISBN:9781450341974
        DOI:10.1145/3035918

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 May 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader