skip to main content
10.1145/3318464.3389730acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Marviq: Quality-Aware Geospatial Visualization of Range-Selection Queries Using Materialization

Published: 31 May 2020 Publication History

Abstract

We study the problem of efficient spatial visualization on a large data set stored in a database using SQL queries with ad-hoc range conditions on numerical attributes, for example, a spatial scatterplot of taxi pickup events in New York between 1/1/2015 and 3/10/2015. We present a novel middleware-based technique called Marviq. It divides the selection-attribute domain into intervals, and precomputes and stores a visualization for each interval. These results are called MVS and stored as tables in the database. We can compute an exact visualization for a request by accessing MVS and retrieving additional records from the base table. To further reduce the latter time, we present algorithms for using MVS to compute an approximate visualization that satisfies a user-specified similarity threshold. We show a family of functions with certain properties that can use this technique. We present an improvement by dividing the MVS intervals into smaller intervals and materializing low-resolution visualization for these intervals. We report the results of an extensive evaluation of Marviq, including a user study, and show its high performance in both space and time.

Supplementary Material

MP4 File (3318464.3389730.mp4)
Presentation Video

References

[1]
Foursquare dataset, 2018. https://enterprise.foursquare.com/products/places.
[2]
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. Blinkdb: queries with bounded errors and bounded response times on very large data. In Eighth Eurosys Conference 2013, EuroSys '13, Prague, Czech Republic, April 14--17, 2013, pages 29--42, 2013.
[3]
L. Battle, R. Chang, and M. Stonebraker. Dynamic prefetching of data tiles for interactive visualization. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 1363--1375, 2016.
[4]
E. Bertini and G. Santucci. Give chance a chance- modeling density to enhance scatter plot quality through random data sampling. Information Visualization, 5(2):95--110, 2006.
[5]
M. Budiu, P. Gopalan, L. Suresh, U. Wieder, H. Kruiger, and M. K. Aguilera. Hillview: A trillion-cell spreadsheet for big data. PVLDB, 12(11):1442--1457, 2019.
[6]
M. Budiu, R. Isaacs, D. Murray, G. D. Plotkin, P. Barham, S. Al-Kiswany, Y. Boshmaf, Q. Luo, and A. Andoni. Interacting with large distributed datasets using sketch. In E. Gobbetti and W. Bethel, editors, EGPGV16: Eurographics Symposium on Parallel Graphics and Visualization, Groningen, The Netherlands, June 6--10, 2016, pages 31--43. Eurographics Association, 2016.
[7]
S. Chan, L. Xiao, J. Gerth, and P. Hanrahan. Maintaining interactivity while exploring massive time series. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology, IEEE VAST 2008, Columbus, Ohio, USA, 19--24 October 2008, pages 59--66. IEEE Computer Society, 2008.
[8]
D. Cheng, P. Schretlen, N. Kronenfeld, N. Bozowsky, and W. Wright. Tile based visual analytics for twitter big data exploratory analysis. In Proceedings of the 2013 IEEE International Conference on Big Data, 6--9 October 2013, Santa Clara, CA, USA, pages 2--4, 2013.
[9]
A. Crotty, A. Galakatos, E. Zgraggen, C. Binnig, and T. Kraska. Vizdom: Interactive analytics through pen and touch. PVLDB, 8(12):2024--2027, 2015.
[10]
A. Crotty, A. Galakatos, E. Zgraggen, C. Binnig, and T. Kraska. The case for interactive data exploration accelerators (ideas). In C. Binnig, A. Fekete, and A. Nandi, editors, Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA@SIGMOD 2016, San Francisco, CA, USA, June 26 - July 01, 2016, page 11. ACM, 2016.
[11]
C. A. de Lara Pahins, S. A. Stephens, C. Scheidegger, and J. L. D. Comba. Hashedcubes: Simple, low memory, real-time visual exploration of big data. IEEE Trans. Vis. Comput. Graph., 23(1):671--680, 2017.
[12]
B. Ding, S. Huang, S. Chaudhuri, K. Chakrabarti, and C. Wang. Sample+seek: Approximating aggregates with distribution precision guarantee. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 679--694, 2016.
[13]
A. Eldawy, M. F. Mokbel, and C. Jonathan. Hadoopviz: A mapreduce framework for extensible visualization of big spatial data. In 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16--20, 2016, pages 601--612, 2016.
[14]
D. Fisher, I. O. Popov, S. M. Drucker, and m. c. schraefel. Trust me, i'm partially right: incremental visualization lets analysts explore large datasets faster. In CHI Conference on Human Factors in Computing Systems, CHI '12, Austin, TX, USA - May 05 - 10, 2012, pages 1673--1682, 2012.
[15]
A. Galakatos, A. Crotty, E. Zgraggen, C. Binnig, and T. Kraska. Revisiting reuse for approximate query processing. PVLDB, 10(10):1142--1153, 2017.
[16]
P. Godfrey, J. Gryz, and P. Lasek. Interactive visualization of large data sets. IEEE Trans. Knowl. Data Eng., 28(8):2142--2157, 2016.
[17]
I. Goiri, R. Bianchini, S. Nagarakatte, and T. D. Nguyen. Approxhadoop: Bringing approximations to mapreduce frameworks. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, Istanbul, Turkey, March 14--18, 2015, pages 383--397, 2015.
[18]
T. Guo, K. Feng, G. Cong, and Z. Bao. Efficient selection of geospatial data on maps for interactive and visualized exploration. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, pages 567--582, 2018.
[19]
J. Im, F. G. Villegas, and M. J. McGuffin. Visreduce: Fast and responsive incremental information visualization of large datasets. In X. Hu, T. Y. Lin, V. V. Raghavan, B. W. Wah, R. A. Baeza-Yates, G. C. Fox, C. Shahabi, M. Smith, Q. Yang, R. Ghani, W. Fan, R. Lempel, and R. Nambiar, editors, Proceedings of the 2013 IEEE International Conference on Big Data, 6--9 October 2013, Santa Clara, CA, USA, pages 25--32. IEEE, 2013.
[20]
Jia Yu and M. Sarwat. Accelerating spatial data visualization dashboards via a materialized sampling approach. In Proceedings of the International Conference on Data Engineering, ICDE, 2020.
[21]
L. Jiang, P. Rahman, and A. Nandi. Evaluating interactive data systems: Workloads, metrics, and guidelines. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, pages 1637--1644, 2018.
[22]
N. Kamat, P. Jayachandran, K. Tunga, and A. Nandi. Distributed and interactive cube exploration. In I. F. Cruz, E. Ferrari, Y. Tao, E. Bertino, and G. Trajcevski, editors, IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pages 472--483. IEEE Computer Society, 2014.
[23]
T. Kraska. Northstar: An interactive data science system. PVLDB, 11(12):2150--2164, 2018.
[24]
L. Dong, Q. Bai, T. Kim, T. Chen, W. Liu and C. Li. Marviq: Quality-Aware Geospatial Visualization of Range-Selection Queries Using Materialization (Full Version). UC Irvine Technical Report, 2020.
[25]
D. J. L. Lee and A. G. Parameswaran. The case for a visual discovery assistant: A holistic solution for accelerating visual data exploration. IEEE Data Eng. Bull., 41(3):3--14, 2018.
[26]
K. Li and G. Li. Approximate query processing: What is new and where to go? - A survey on approximate query processing. Data Science and Engineering, 3(4):379--397, 2018.
[27]
L. D. Lins, J. T. Klosowski, and C. E. Scheidegger. Nanocubes for real-time exploration of spatiotemporal datasets. IEEE Trans. Vis. Comput. Graph., 19(12):2456--2465, 2013.
[28]
Z. Liu and J. Heer. The effects of interactive latency on exploratory visual analysis. IEEE Trans. Vis. Comput. Graph., 20(12):2122--2131, 2014.
[29]
Z. Liu, B. Jiang, and J. Heer. imMens: Real-time visual querying of big data. Comput. Graph. Forum, 32(3):421--430, 2013.
[30]
MapD demo. https://www.mapd.com/demos/taxis.
[31]
D. Moritz, D. Fisher, B. Ding, and C. Wang. Trust, but verify: Optimistic visualizations of approximate queries for exploring big data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, May 06--11, 2017., pages 2904--2915, 2017.
[32]
D. Moritz, B. Howe, and J. Heer. Falcon: Balancing interactive latency and resolution sensitivity for scalable linked visualizations. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI 2019, Glasgow, Scotland, UK, May 04-09, 2019, page 694, 2019.
[33]
B. Mozafari, J. Ramnarayan, S. Menon, Y. Mahajan, S. Chakraborty, H. Bhanawat, and K. Bachhav. Snappydata: A unified cluster for streaming, transactions and interactice analytics. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings, 2017.
[34]
T. N. Pappas, R. J. Safranek, and J. Chen. Perceptual criteria for image quality evaluation. Handbook of image and video processing, pages 669--684, 2000.
[35]
Y. Park, M. J. Cafarella, and B. Mozafari. Visualization-aware sampling for very large databases. In 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16--20, 2016, pages 755--766, 2016.
[36]
Y. Park, B. Mozafari, J. Sorenson, and J. Wang. Verdictdb: Universalizing approximate query processing. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, pages 1461--1476, 2018.
[37]
J. Peng, D. Zhang, J. Wang, and J. Pei. AQP+: connecting approximate query processing with aggregate precomputation for interactive analytics. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, pages 1477--1492, 2018.
[38]
F. Psallidas and E. Wu. Provenance for interactive visualizations. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA@SIGMOD 2018, Houston, TX, USA, June 10, 2018, pages 9:1--9:8, 2018.
[39]
S. Rahman, M. Aliakbarpour, H. Kong, E. Blais, K. Karahalios, A. G. Parameswaran, and R. Rubinfeld. I've seen "enough": Incrementally improving visualizations to support rapid decision making. PVLDB, 10(11):1262--1273, 2017.
[40]
E. A. Rundensteiner, M. O. Ward, Z. Xie, Q. Cui, C. V. Wad, D. Yang, and S. Huang. Xmdvtool(q): : quality-aware interactive data exploration. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, June 12--14, 2007, pages 1109--1112, 2007.
[41]
W. Tao, X. Liu, Y. Wang, L. Battle, cC. Demiralp, R. Chang, and M. Stonebraker. Kyrix: Interactive pan/zoom visualizations at scale. Comput. Graph. Forum, 38(3):529--540, 2019.
[42]
2018. https://developer.twitter.com/en.html.
[43]
2018. https://github.com/fivethirtyeight/uber-tlc-foil-response.
[44]
L. Wang, R. Christensen, F. Li, and K. Yi. Spatial online sampling and aggregation. PVLDB, 9(3):84--95, 2015.
[45]
Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, et al. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600--612, 2004.
[46]
Z. Wang, N. Ferreira, Y. Wei, A. S. Bhaskar, and C. Scheidegger. Gaussian cubes: Real-time modeling for visual exploration of large multidimensional datasets. IEEE Trans. Vis. Comput. Graph., 23(1):681--690, 2017.
[47]
L. Weng and B. Preneel. A secure perceptual hash algorithm for image content authentication. In B. De Decker, J. Lapon, V. Naessens, and A. Uhl, editors, Communications and Multimedia Security, pages 108--121, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.
[48]
J. Yu, R. Moraffah, and M. Sarwat. Hippo in action: Scalable indexing of a billion new york city taxi trips and beyond. In 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19--22, 2017, pages 1413--1414. IEEE Computer Society, 2017.
[49]
J. Yu, Z. Zhang, and M. Sarwat. Geosparkviz: a scalable geospatial data visualization framework in the apache spark ecosystem. In Proceedings of the 30th International Conference on Scientific and Statistical Database Management, SSDBM 2018, Bozen-Bolzano, Italy, July 09--11, 2018, pages 15:1--15:12, 2018.
[50]
K. Zeng, S. Agarwal, A. Dave, M. Armbrust, and I. Stoica. G-OLA: generalized on-line aggregation for interactive analysis on big data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, pages 913--918, 2015.
[51]
X. Zhang, J. Wang, J. Yin, and S. Ji. Sapprox: Enabling efficient and accurate approximations on sub-datasets with distribution-aware online sampling. PVLDB, 10(3):109--120, 2016.

Cited By

View all
  • (2022)SAFEProceedings of the VLDB Endowment10.14778/3494124.349413515:3(513-526)Online publication date: 4-Feb-2022
  • (2020)A Time-Windowed Data Structure for Spatial Density MapsProceedings of the 28th International Conference on Advances in Geographic Information Systems10.1145/3397536.3422242(15-24)Online publication date: 3-Nov-2020

Index Terms

  1. Marviq: Quality-Aware Geospatial Visualization of Range-Selection Queries Using Materialization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
    June 2020
    2925 pages
    ISBN:9781450367356
    DOI:10.1145/3318464
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 May 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Marviq
    2. quality guarantee
    3. spatial data
    4. visualization

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)SAFEProceedings of the VLDB Endowment10.14778/3494124.349413515:3(513-526)Online publication date: 4-Feb-2022
    • (2020)A Time-Windowed Data Structure for Spatial Density MapsProceedings of the 28th International Conference on Advances in Geographic Information Systems10.1145/3397536.3422242(15-24)Online publication date: 3-Nov-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media