ABSTRACT
Kernel density visualization, or KDV, is used to view and understand data points in various domains, including traffic or crime hotspot detection, ecological modeling, chemical geology, and physical modeling. Existing solutions, which are based on computing kernel density (KDE) functions, are computationally expensive. Our goal is to improve the performance of KDV, in order to support large datasets (e.g., one million points) and high screen resolutions (e.g., 1280 x 960 pixels). We examine two widely-used variants of KDV, namely approximate kernel density visualization (EKDV) and thresholded kernel density visualization (TKDV). For these two operations, we develop fast solution, called QUAD, by deriving quadratic bounds of KDE functions for different types of kernel functions, including Gaussian, triangular etc. We further adopt a progressive visualization framework for KDV, in order to stream partial visualization results to users continuously. Extensive experiment results show that our new KDV techniques can provide at least one-order-of-magnitude speedup over existing methods, without degrading visualization quality. We further show that QUAD can produce the reasonable visualization results in real-time (0.5 sec) by combining the progressive visualization framework in single machine setting without using GPU and parallel computation.
Supplemental Material
- Arcgis. http://pro.arcgis.com/en/pro-app/tool-reference/spatial-analyst/ how-kernel-density-works.htm.Google Scholar
- Atlanta police department open data. http://opendata.atlantapd.org/.Google Scholar
- Qgis. https://docs.qgis.org/2.18/en/docs/user_manual/plugins/plugins_heatmap.html.Google Scholar
- UCI machine learning repository. http://archive.ics.uci.edu/ml/index.php.Google Scholar
- Comparison of density estimation methods for astronomical datasets. Astronomy and Astrophysics, 531, 7 2011.Google Scholar
- S. Chainey, L. Tompson, and S. Uhlig. The utility of hotspot mapping for predicting spatial patterns of crime. Security Journal, 21(1):4--28, Feb 2008.Google ScholarCross Ref
- T. N. Chan, R. Cheng, and M. L. Yiu. QUAD: Quadratic-boundbased kernel density visualization (HKU Technical Report TR-2019- . https://www.cs.hku.hk/data/techreps/document/TR-2019-05.pdf.Google Scholar
- T. N. Chan, M. L. Yiu, and K. A. Hua. A progressive approach for similarity search on matrix. In SSTD, pages 373--390. Springer, 2015.Google ScholarCross Ref
- T. N. Chan, M. L. Yiu, and K. A. Hua. Efficient sub-window nearest neighbor search on matrix. IEEE Trans. Knowl. Data Eng., 29(4):784--797, 2017.Google ScholarDigital Library
- T. N. Chan, M. L. Yiu, and L. H. U. KARL: fast kernel aggregation queries. In ICDE, pages 542--553, 2019.Google ScholarCross Ref
- W. Chen, F. Guo, and F. Wang. A survey of traffic data visualization. IEEE Trans. Intelligent Transportation Systems, 16(6):2970--2984, 2015.Google ScholarDigital Library
- E. Cheney and W. Light. A Course in Approximation Theory. Mathematics Series. Brooks/Cole Publishing Company, 2000.Google Scholar
- K. Cranmer. Kernel estimation in high-energy physics. 136:198--207, 2001.Google Scholar
- M. D. Felice, M. Petitta, and P. M. Ruti. Short-term predictability of photovoltaic production over Italy. Renewable Energy, 80:197 -- 204, 2015.Google Scholar
- S. Frey, F. Sadlo, K. Ma, and T. Ertl. Interactive progressive visualization with space-time error control. IEEE Trans. Vis. Comput. Graph., 20(12):2397--2406, 2014.Google ScholarCross Ref
- E. Gan and P. Bailis. Scalable kernel density classification via thresholdbased pruning. In ACM SIGMOD, pages 945--959, 2017.Google Scholar
- E. R. Gansner, Y. Hu, S. C. North, and C. E. Scheidegger. Multilevel agglomerative edge bundling for visualizing large graphs. In PacificVis, pages 187--194, 2011.Google ScholarCross Ref
- W. Gong, D. Yang, H. V. Gupta, and G. Nearing. Estimating information entropy for hydrological data: One-dimensional case. Water Resources Research, 50(6):5003--5018, 2014.Google ScholarCross Ref
- A. Gramacki. Nonparametric Kernel Density Estimation and Its Computational Aspects. Studies in Big Data. Springer International Publishing, 2017.Google Scholar
- A. G. Gray and A. W. Moore. Nonparametric density estimation: Toward computational tractability. In SDM, pages 203--211, 2003.Google ScholarCross Ref
- T. Guo, K. Feng, G. Cong, and Z. Bao. Efficient selection of geospatial data on maps for interactive and visualized exploration. In SIGMOD, pages 567--582, 2018.Google ScholarDigital Library
- T. Guo, M. Li, P. Li, Z. Bao, and G. Cong. Poisam: a system for efficient selection of large-scale geospatial data on maps. In SIGMOD, pages 1677--1680, 2018.Google ScholarDigital Library
- T. Hart and P. Zandbergen. Kernel density estimation and hotspot mapping: examining the influence of interpolation method, grid cell size, and bandwidth on crime forecasting. Policing: An International Journal of Police Strategies and Management, 37:305--323, 2014.Google ScholarCross Ref
- Q. Jin, X. Ma, G. Wang, X. Yang, and F. Guo. Dynamics of major air pollutants from crop residue burning in mainland china, 2000--2014. Journal of Environmental Sciences, 70:190 -- 205, 2018.Google Scholar
- S. C. Joshi, R. V. Kommaraju, J. M. Phillips, and S. Venkatasubramanian. Comparing distributions and shapes using the kernel distance. In SOCG, pages 47--56, 2011.Google ScholarDigital Library
- P. K. Kefaloukos, M. A. V. Salles, and M. Zachariasen. Declarative cartography: In-database map generalization of geospatial datasets. In ICDE, pages 1024--1035, 2014.Google ScholarCross Ref
- J. Kehrer and H. Hauser. Visualization and visual analysis of multifaceted scientific data: A survey. IEEE Trans. Vis. Comput. Graph., 19(3):495--513, 2013.Google ScholarDigital Library
- D. A. Keim. Visual exploration of large data sets. Commun. ACM, 44(8):38--44, 2001.Google ScholarDigital Library
- O. D. Lampe and H. Hauser. Interactive visualization of streaming data with kernel density estimation. In Pacific Vis, pages 171--178, 2011.Google ScholarCross Ref
- H. Lee and K. Kang. Interpolation of missing precipitation data using kernel estimations for hydrologic modeling. Advances in Meteorology, pages 1--12, 2015.Google ScholarCross Ref
- M. Li, Z. Bao, F. M. Choudhury, and T. Sellis. Supporting large-scale geographical visualization in a multi-granularity way. In WSDM, pages 767--770, 2018.Google ScholarDigital Library
- Y.-P. Lin, H.-J. Chu, C.-F. Wu, T.-K. Chang, and C.-Y. Chen. Hotspot analysis of spatial environmental pollutants using kernel density estimation and geostatistical techniques. International Journal of Environmental Research and Public Health, 8(1):75--88, 2011.Google ScholarCross Ref
- Y. Ma, M. Richards, M. Ghanem, Y. Guo, and J. Hassard. Air pollution monitoring and mining based on sensor grid in london. Sensors, 8(6):3601--3623, 2008.Google ScholarCross Ref
- A. Mayorga and M. Gleicher. Splatterplots: Overcoming overdraw in scatter plots. IEEE Transactions on Visualization and Computer Graphics, 19(9):1526--1538, Sept 2013.Google ScholarDigital Library
- L. Micallef, G. Palmas, A. Oulasvirta, and T. Weinkauf. Towards perceptual optimization of the visual design of scatterplots. IEEE Trans. Vis. Comput. Graph., 23(6):1588--1599, 2017.Google ScholarDigital Library
- Y. Park, M. J. Cafarella, and B. Mozafari. Visualization-aware sampling for very large databases. In ICDE, pages 755--766, 2016.Google ScholarCross Ref
- Y. Park, B. Mozafari, J. Sorenson, and J. Wang. Verdictdb: Universalizing approximate query processing. In SIGMOD, pages 1461--1476, 2018.Google ScholarDigital Library
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander- Plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825--2830, 2011.Google ScholarDigital Library
- A. Perrot, R. Bourqui, N. Hanusse, F. Lalanne, and D. Auber. Large interactive visualization of density functions on big data infrastructure. In LDAV, pages 99--106, 2015.Google ScholarDigital Library
- J. M. Phillips. -samples for kernels. In SODA, pages 1622--1632, 2013.Google Scholar
- J. M. Phillips and W. M. Tai. Improved coresets for kernel density estimates. In SODA, pages 2718--2727, 2018.Google ScholarCross Ref
- J. M. Phillips and W. M. Tai. Near-optimal coresets of kernel density estimates. In SOCG, pages 66:1--66:13, 2018.Google Scholar
- QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial Foundation, 2009.Google Scholar
- V. C. Raykar, R. Duraiswami, and L. H. Zhao. Fast computation of kernel estimators. Journal of Computational and Graphical Statistics, 19(1):205--220, 2010.Google ScholarCross Ref
- A. D. Sarma, H. Lee, H. Gonzalez, J. Madhavan, and A. Y. Halevy. Efficient spatial sampling of large geographical tables. In SIGMOD, pages 193--204, 2012.Google ScholarDigital Library
- D. Scott. Multivariate Density Estimation: Theory, Practice, and Visualization. A Wiley-interscience publication. Wiley, 1992.Google Scholar
- A. C. Telea. Data Visualization: Principles and Practice, Second Edition. A. K. Peters, Ltd., Natick, MA, USA, 2nd edition, 2014.Google Scholar
- L. Thakali, T. J. Kwon, and L. Fu. Identification of crash hotspots using kernel density estimation and kriging methods: a comparison. Journal of Modern Transportation, 23(2):93--106, Jun 2015.Google ScholarCross Ref
- P. Vermeesch. On the visualisation of detrital age distributions. Chemical Geology, 312--313(Complete):190--194, 2012.Google Scholar
- I. A. S. Vladislav Kirillovich Dziadyk. Theory of Uniform Approximation of Functions by Polynomials. Walter De Gruyter, 2008.Google Scholar
- M. Williams and T. Munzner. Steerable, progressive multidimensional scaling. In InfoVis, pages 57--64, 2004.Google ScholarCross Ref
- K. Xie, K. Ozbay, A. Kurkcu, and H. Yang. Analysis of traffic crashes involving pedestrians using big data: Investigation of contributing factors and identification of hotspots. Risk Analysis, 37(8):1459--1476, 2017.Google ScholarCross Ref
- C. Yang, R. Duraiswami, and L. S. Davis. Efficient kernel machines using the improved fast gauss transform. In NIPS, pages 1561--1568, 2004.Google Scholar
- H. Yu, P. Liu, J. Chen, and H. Wang. Comparative analysis of the spatial analysis methods for hotspot identification. Accident Analysis and Prevention, 66:80 -- 88, 2014.Google Scholar
- G. Zhang, A. Zhu, and Q. Huang. A gpu-accelerated adaptive kernel density estimation approach for efficient point pattern analysis on spatial big data. International Journal of Geographical Information Science, 31(10):2068--2097, 2017.Google ScholarDigital Library
- X. Zhao and J. Tang. Crime in urban areas: A data mining perspective. SIGKDD Explorations, 20(1):1--12, 2018.Google ScholarDigital Library
- Y. Zheng, J. Jestes, J. M. Phillips, and F. Li. Quality and efficiency for kernel density estimates in large data. In SIGMOD, pages 433--444, 2013.Google ScholarDigital Library
- Y. Zheng, Y. Ou, A. Lex, and J. M. Phillips. Visualization of big spatial data using coresets for kernel density estimates. In IEEE Symposium on Visualization in Data Science (VDS '17), to appear. IEEE, 2017.Google Scholar
- Y. Zheng and J. M. Phillips. L. error and bandwidth selection for kernel density estimates of large data. In SIGKDD, pages 1533--1542, 2015.Google ScholarDigital Library
- M. Zinsmaier, U. Brandes, O. Deussen, and H. Strobelt. Interactive level-of-detail rendering of large graphs. IEEE Trans. Vis. Comput. Graph., 18(12):2486--2495, 2012.Google ScholarDigital Library
Index Terms
- QUAD: Quadratic-Bound-based Kernel Density Visualization
Recommendations
SLAM: Efficient Sweep Line Algorithms for Kernel Density Visualization
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataKernel Density Visualization (KDV) has been extensively used in a wide range of applications, including traffic accident hotspot detection, crime hotspot detection, disease outbreak detection, and ecological modeling. However, KDV is a computationally ...
Error analysis and applications of the Fourier-Galerkin Runge-Kutta schemes for high-order stiff PDEs
An integrating factor mixed with Runge-Kutta technique is a time integration method that can be efficiently combined with spatial spectral approximations to provide a very high resolution to the smooth solutions of some linear and nonlinear partial ...
Fourth-Order Time-Stepping for Stiff PDEs
A modification of the exponential time-differencing fourth-order Runge--Kutta method for solving stiff nonlinear PDEs is presented that solves the problem of numerical instability in the scheme as proposed by Cox and Matthews and generalizes the method to ...
Comments