skip to main content
10.1145/3318464.3380561acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

QUAD: Quadratic-Bound-based Kernel Density Visualization

Authors Info & Claims
Published:31 May 2020Publication History

ABSTRACT

Kernel density visualization, or KDV, is used to view and understand data points in various domains, including traffic or crime hotspot detection, ecological modeling, chemical geology, and physical modeling. Existing solutions, which are based on computing kernel density (KDE) functions, are computationally expensive. Our goal is to improve the performance of KDV, in order to support large datasets (e.g., one million points) and high screen resolutions (e.g., 1280 x 960 pixels). We examine two widely-used variants of KDV, namely approximate kernel density visualization (EKDV) and thresholded kernel density visualization (TKDV). For these two operations, we develop fast solution, called QUAD, by deriving quadratic bounds of KDE functions for different types of kernel functions, including Gaussian, triangular etc. We further adopt a progressive visualization framework for KDV, in order to stream partial visualization results to users continuously. Extensive experiment results show that our new KDV techniques can provide at least one-order-of-magnitude speedup over existing methods, without degrading visualization quality. We further show that QUAD can produce the reasonable visualization results in real-time (0.5 sec) by combining the progressive visualization framework in single machine setting without using GPU and parallel computation.

Skip Supplemental Material Section

Supplemental Material

3318464.3380561.mp4

mp4

113.6 MB

References

  1. Arcgis. http://pro.arcgis.com/en/pro-app/tool-reference/spatial-analyst/ how-kernel-density-works.htm.Google ScholarGoogle Scholar
  2. Atlanta police department open data. http://opendata.atlantapd.org/.Google ScholarGoogle Scholar
  3. Qgis. https://docs.qgis.org/2.18/en/docs/user_manual/plugins/plugins_heatmap.html.Google ScholarGoogle Scholar
  4. UCI machine learning repository. http://archive.ics.uci.edu/ml/index.php.Google ScholarGoogle Scholar
  5. Comparison of density estimation methods for astronomical datasets. Astronomy and Astrophysics, 531, 7 2011.Google ScholarGoogle Scholar
  6. S. Chainey, L. Tompson, and S. Uhlig. The utility of hotspot mapping for predicting spatial patterns of crime. Security Journal, 21(1):4--28, Feb 2008.Google ScholarGoogle ScholarCross RefCross Ref
  7. T. N. Chan, R. Cheng, and M. L. Yiu. QUAD: Quadratic-boundbased kernel density visualization (HKU Technical Report TR-2019- . https://www.cs.hku.hk/data/techreps/document/TR-2019-05.pdf.Google ScholarGoogle Scholar
  8. T. N. Chan, M. L. Yiu, and K. A. Hua. A progressive approach for similarity search on matrix. In SSTD, pages 373--390. Springer, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  9. T. N. Chan, M. L. Yiu, and K. A. Hua. Efficient sub-window nearest neighbor search on matrix. IEEE Trans. Knowl. Data Eng., 29(4):784--797, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. N. Chan, M. L. Yiu, and L. H. U. KARL: fast kernel aggregation queries. In ICDE, pages 542--553, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  11. W. Chen, F. Guo, and F. Wang. A survey of traffic data visualization. IEEE Trans. Intelligent Transportation Systems, 16(6):2970--2984, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Cheney and W. Light. A Course in Approximation Theory. Mathematics Series. Brooks/Cole Publishing Company, 2000.Google ScholarGoogle Scholar
  13. K. Cranmer. Kernel estimation in high-energy physics. 136:198--207, 2001.Google ScholarGoogle Scholar
  14. M. D. Felice, M. Petitta, and P. M. Ruti. Short-term predictability of photovoltaic production over Italy. Renewable Energy, 80:197 -- 204, 2015.Google ScholarGoogle Scholar
  15. S. Frey, F. Sadlo, K. Ma, and T. Ertl. Interactive progressive visualization with space-time error control. IEEE Trans. Vis. Comput. Graph., 20(12):2397--2406, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  16. E. Gan and P. Bailis. Scalable kernel density classification via thresholdbased pruning. In ACM SIGMOD, pages 945--959, 2017.Google ScholarGoogle Scholar
  17. E. R. Gansner, Y. Hu, S. C. North, and C. E. Scheidegger. Multilevel agglomerative edge bundling for visualizing large graphs. In PacificVis, pages 187--194, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  18. W. Gong, D. Yang, H. V. Gupta, and G. Nearing. Estimating information entropy for hydrological data: One-dimensional case. Water Resources Research, 50(6):5003--5018, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. Gramacki. Nonparametric Kernel Density Estimation and Its Computational Aspects. Studies in Big Data. Springer International Publishing, 2017.Google ScholarGoogle Scholar
  20. A. G. Gray and A. W. Moore. Nonparametric density estimation: Toward computational tractability. In SDM, pages 203--211, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  21. T. Guo, K. Feng, G. Cong, and Z. Bao. Efficient selection of geospatial data on maps for interactive and visualized exploration. In SIGMOD, pages 567--582, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Guo, M. Li, P. Li, Z. Bao, and G. Cong. Poisam: a system for efficient selection of large-scale geospatial data on maps. In SIGMOD, pages 1677--1680, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Hart and P. Zandbergen. Kernel density estimation and hotspot mapping: examining the influence of interpolation method, grid cell size, and bandwidth on crime forecasting. Policing: An International Journal of Police Strategies and Management, 37:305--323, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  24. Q. Jin, X. Ma, G. Wang, X. Yang, and F. Guo. Dynamics of major air pollutants from crop residue burning in mainland china, 2000--2014. Journal of Environmental Sciences, 70:190 -- 205, 2018.Google ScholarGoogle Scholar
  25. S. C. Joshi, R. V. Kommaraju, J. M. Phillips, and S. Venkatasubramanian. Comparing distributions and shapes using the kernel distance. In SOCG, pages 47--56, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. K. Kefaloukos, M. A. V. Salles, and M. Zachariasen. Declarative cartography: In-database map generalization of geospatial datasets. In ICDE, pages 1024--1035, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Kehrer and H. Hauser. Visualization and visual analysis of multifaceted scientific data: A survey. IEEE Trans. Vis. Comput. Graph., 19(3):495--513, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. A. Keim. Visual exploration of large data sets. Commun. ACM, 44(8):38--44, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. O. D. Lampe and H. Hauser. Interactive visualization of streaming data with kernel density estimation. In Pacific Vis, pages 171--178, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  30. H. Lee and K. Kang. Interpolation of missing precipitation data using kernel estimations for hydrologic modeling. Advances in Meteorology, pages 1--12, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  31. M. Li, Z. Bao, F. M. Choudhury, and T. Sellis. Supporting large-scale geographical visualization in a multi-granularity way. In WSDM, pages 767--770, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y.-P. Lin, H.-J. Chu, C.-F. Wu, T.-K. Chang, and C.-Y. Chen. Hotspot analysis of spatial environmental pollutants using kernel density estimation and geostatistical techniques. International Journal of Environmental Research and Public Health, 8(1):75--88, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  33. Y. Ma, M. Richards, M. Ghanem, Y. Guo, and J. Hassard. Air pollution monitoring and mining based on sensor grid in london. Sensors, 8(6):3601--3623, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  34. A. Mayorga and M. Gleicher. Splatterplots: Overcoming overdraw in scatter plots. IEEE Transactions on Visualization and Computer Graphics, 19(9):1526--1538, Sept 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Micallef, G. Palmas, A. Oulasvirta, and T. Weinkauf. Towards perceptual optimization of the visual design of scatterplots. IEEE Trans. Vis. Comput. Graph., 23(6):1588--1599, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Park, M. J. Cafarella, and B. Mozafari. Visualization-aware sampling for very large databases. In ICDE, pages 755--766, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  37. Y. Park, B. Mozafari, J. Sorenson, and J. Wang. Verdictdb: Universalizing approximate query processing. In SIGMOD, pages 1461--1476, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander- Plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825--2830, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Perrot, R. Bourqui, N. Hanusse, F. Lalanne, and D. Auber. Large interactive visualization of density functions on big data infrastructure. In LDAV, pages 99--106, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. M. Phillips. -samples for kernels. In SODA, pages 1622--1632, 2013.Google ScholarGoogle Scholar
  41. J. M. Phillips and W. M. Tai. Improved coresets for kernel density estimates. In SODA, pages 2718--2727, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  42. J. M. Phillips and W. M. Tai. Near-optimal coresets of kernel density estimates. In SOCG, pages 66:1--66:13, 2018.Google ScholarGoogle Scholar
  43. QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial Foundation, 2009.Google ScholarGoogle Scholar
  44. V. C. Raykar, R. Duraiswami, and L. H. Zhao. Fast computation of kernel estimators. Journal of Computational and Graphical Statistics, 19(1):205--220, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  45. A. D. Sarma, H. Lee, H. Gonzalez, J. Madhavan, and A. Y. Halevy. Efficient spatial sampling of large geographical tables. In SIGMOD, pages 193--204, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. D. Scott. Multivariate Density Estimation: Theory, Practice, and Visualization. A Wiley-interscience publication. Wiley, 1992.Google ScholarGoogle Scholar
  47. A. C. Telea. Data Visualization: Principles and Practice, Second Edition. A. K. Peters, Ltd., Natick, MA, USA, 2nd edition, 2014.Google ScholarGoogle Scholar
  48. L. Thakali, T. J. Kwon, and L. Fu. Identification of crash hotspots using kernel density estimation and kriging methods: a comparison. Journal of Modern Transportation, 23(2):93--106, Jun 2015.Google ScholarGoogle ScholarCross RefCross Ref
  49. P. Vermeesch. On the visualisation of detrital age distributions. Chemical Geology, 312--313(Complete):190--194, 2012.Google ScholarGoogle Scholar
  50. I. A. S. Vladislav Kirillovich Dziadyk. Theory of Uniform Approximation of Functions by Polynomials. Walter De Gruyter, 2008.Google ScholarGoogle Scholar
  51. M. Williams and T. Munzner. Steerable, progressive multidimensional scaling. In InfoVis, pages 57--64, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  52. K. Xie, K. Ozbay, A. Kurkcu, and H. Yang. Analysis of traffic crashes involving pedestrians using big data: Investigation of contributing factors and identification of hotspots. Risk Analysis, 37(8):1459--1476, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  53. C. Yang, R. Duraiswami, and L. S. Davis. Efficient kernel machines using the improved fast gauss transform. In NIPS, pages 1561--1568, 2004.Google ScholarGoogle Scholar
  54. H. Yu, P. Liu, J. Chen, and H. Wang. Comparative analysis of the spatial analysis methods for hotspot identification. Accident Analysis and Prevention, 66:80 -- 88, 2014.Google ScholarGoogle Scholar
  55. G. Zhang, A. Zhu, and Q. Huang. A gpu-accelerated adaptive kernel density estimation approach for efficient point pattern analysis on spatial big data. International Journal of Geographical Information Science, 31(10):2068--2097, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. X. Zhao and J. Tang. Crime in urban areas: A data mining perspective. SIGKDD Explorations, 20(1):1--12, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Y. Zheng, J. Jestes, J. M. Phillips, and F. Li. Quality and efficiency for kernel density estimates in large data. In SIGMOD, pages 433--444, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Y. Zheng, Y. Ou, A. Lex, and J. M. Phillips. Visualization of big spatial data using coresets for kernel density estimates. In IEEE Symposium on Visualization in Data Science (VDS '17), to appear. IEEE, 2017.Google ScholarGoogle Scholar
  59. Y. Zheng and J. M. Phillips. L. error and bandwidth selection for kernel density estimates of large data. In SIGKDD, pages 1533--1542, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. M. Zinsmaier, U. Brandes, O. Deussen, and H. Strobelt. Interactive level-of-detail rendering of large graphs. IEEE Trans. Vis. Comput. Graph., 18(12):2486--2495, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. QUAD: Quadratic-Bound-based Kernel Density Visualization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
          June 2020
          2925 pages
          ISBN:9781450367356
          DOI:10.1145/3318464

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 31 May 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader