ABSTRACT
To support effective data exploration, there has been a growing interest in developing solutions that can automatically recommend data visualizations that reveal interesting and useful data-driven insights. In such solutions, a large number of possible data visualization views are generated and ranked according to some metric of importance (e.g., a deviation-based metric), then the top-k most important views are recommended. However, one drawback of that approach is that it often recommends similar views, leaving the data analyst with a limited amount of gained insights. To address that limitation, in this work we posit that employing diversification techniques in the process of view recommendation allows eliminating that redundancy and provides a good and concise coverage of the possible insights to be discovered. To that end, we propose a hybrid objective utility function, which captures both the importance, as well as the diversity of the insights revealed by the recommended views. While in principle, traditional diversification methods (e.g., Greedy Construction) provide plausible solutions under our proposed utility function, they suffer from a significantly high query processing cost. In particular, directly applying such methods leads to a "process-first-diversify-next" approach, in which all possible data visualization are generated first via executing a large number of aggregate queries. To address that challenge, we propose an integrated scheme called DiVE, which efficiently selects the top-k recommended view based on our hybrid utility function. DiVE leverages the properties of both the importance and diversity metrics to prune a large number of query executions without compromising the quality of recommendations. Our experimental evaluation on real datasets shows the performance gains provided by DiVE.
- A. M. Albarrak and M. A. Sharaf. 2017. Efficient schemes for similarity-aware refinement of aggregation queries. World Wide Web , Vol. 20, 6 (2017), 1237--1267. Google ScholarDigital Library
- C. L. A. Clarke et almbox. 2008. Novelty and diversity in information retrieval evaluation. In SIGIR. Google ScholarDigital Library
- M. Drosou and E. Pitoura. 2010. Search result diversification. SIGMOD Record , Vol. 39, 1 (2010), 41--47. Google ScholarDigital Library
- H. Ehsan et almbox. 2016. MuVE: Efficient Multi-Objective View Recommendation for Visual Data Exploration. In ICDE.Google Scholar
- H. Ehsan et almbox. 2018. Efficient Recommendation of Aggregate Data Visualizations. TKDE , Vol. 30, 2 (2018), 263--277.Google ScholarCross Ref
- R. Fagin et almbox. 2003. Comparing top k lists. In ACM-SIAM.Google Scholar
- Y. Hu et almbox. 2009. Estimating aggregates in time-constrained approximate queries in Oracle. In EDBT. Google ScholarDigital Library
- Z. Hussain et almbox. 2015. Diversifying with Few Regrets, But too Few to Mention. In ExploreDB. Google ScholarDigital Library
- I. F. Ilyas et almbox. 2008. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. , Vol. 40, 4 (2008), 11:1--11:58. Google ScholarDigital Library
- S. Kandel et almbox. 2012. Profiler: integrated statistical analysis and visualization for data quality assessment. In AVI. Google ScholarDigital Library
- V. Kantere. 2016. Query Similarity for Approximate Query Answering. In DEXA .Google Scholar
- V. Kantere et almbox. 2015. Query Relaxation across Heterogeneous Data Sources. In CIKM . Google ScholarDigital Library
- A. Key et almbox. 2012. VizDeck: self-organizing dashboards for visual analytics. In SIGMOD. Google ScholarDigital Library
- H. A. Khan and M. A. Sharaf. 2015. Progressive diversification for column-based data exploration platforms. In ICDE.Google Scholar
- D. Rafiei et almbox. 2010. Diversifying web search results. In WWW. Google ScholarDigital Library
- T. Sellam et almbox. 2016. Ziggy: Characterizing Query Results for Data Explorers. PVLDB , Vol. 9, 13 (2016), 1473--1476. Google ScholarDigital Library
- T. Sellam and M. L. Kersten. 2016. Fast, Explainable View Detection to Characterize Exploration Queries. In SSDBM . Google ScholarDigital Library
- J. Seo and B. Shneiderman. 2006. Knowledge Discovery in High-Dimensional Data: Case Studies and a User Survey for the Rank-by-Feature Framework. TVGC , Vol. 12, 3 (2006), 311--322. Google ScholarDigital Library
- B. Smyth et almbox. 2001. Similarity vs. Diversity. In ICCBR. Google ScholarDigital Library
- Q. T. Tran and C. Y. Chan. 2010. How to ConQueR why-not questions. In SIGMOD . Google ScholarDigital Library
- M. Vartak et almbox. 2014. SEEDB: Automatically Generating Query Visualizations. PVLDB , Vol. 7, 13 (2014), 1581--1584. Google ScholarDigital Library
- M. Vartak et almbox. 2015. SEEDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics. PVLDB , Vol. 8, 13 (2015), 2182--2193. Google ScholarDigital Library
- F. B. Viegas et almbox. 2007. Many Eyes: A site for visualization at internet scale . TVGC (2007), 1121--1128. Google ScholarDigital Library
- M. R. Vieira et almbox. 2011. On query result diversification. In ICDE. Google ScholarDigital Library
- E. Wu et almbox. 2014. The Case for Data Visualization Management Systems. PVLDB , Vol. 7, 10 (2014), 903--906. Google ScholarDigital Library
- C. Yu et almbox. 2009. It takes variety to make a world: diversification in recommender systems. In EDBT. Google ScholarDigital Library
- M. Zhang and N. Hurley. 2008. Avoiding monotony: improving the diversity of recommendation lists. In RecSys . Google ScholarDigital Library
Index Terms
DiVE: Diversifying View Recommendation for Visual Data Exploration
Recommendations
A Visual Analytics Tool for Analysing Microarray Data
ICDMW '10: Proceedings of the 2010 IEEE International Conference on Data Mining WorkshopsThis paper presents a new visual analytics tool for analysing microarray data with several thousands of attributes. The tool includes two components 1) automated data analysis and 2) interactive visualization. Automated data analysis is used to reduce ...
RawVis: A System for Efficient In-situ Visual Analytics
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataIn-situ processing has received a great deal of attention in recent years. In in-situ scenarios, big raw data files which do not fit in main memory, must be efficiently handled on-the-fly using commodity hardware, without the overhead of a preprocessing ...
DIVE: A Mixed-Initiative System Supporting Integrated Data Exploration Workflows
HILDA '18: Proceedings of the Workshop on Human-In-the-Loop Data AnalyticsGenerating knowledge from data is an increasingly important activity. This process of data exploration consists of multiple tasks: data ingestion, visualization, statistical analysis, and storytelling. Though these tasks are complementary, analysts ...
Comments