Abstract
We propose a locally adaptive technique to address the problem of setting the bandwidth parameters for kernel density estimation. Our technique is efficient and can be performed in only two dataset passes. We also show how to apply our technique to efficiently solve range query approximation, classification and clustering problems for very large datasets. We validate the efficiency and accuracy of our technique by presenting experimental results on a variety of both synthetic and real datasets.
Similar content being viewed by others
References
Bennett KP, Fayyad U, Geiger D (1999) Density-Based Indexing for Approximate Nearest-Neighbor Queries. Proc of the Int Conf on Knowl Discovery and Data Mining
Bradley PS, Fayyad U, Reina C (1998) Scaling Clustering Algorithms to Large Datasets. Proc of the Int Conf on Knowl Discovery and Data Mining
Breiman L, Meisel W, Purcell E (1977) Variable Kernel Estimates of Multivariate Densities. Technometrics 13:135–144
Chakrabarti K, Garofalakis MN, Rastogi R, Shim K (2000) Approximate Query Processing Using Wavelets. Proc of the Int Conf on Very Large Data Bases
Cressie NAC (1993) Statistics For Spatial Data. Wiley, New York
Friedman JH, Fisher NI (1999) Bump Hunting in High-Dimensional Data. Stat Comput 9(2):123–143
Gunopulos D, Kollios G, Tsotras V, Domeniconi C (2000) Approximating multi-dimensional aggregate range queries over real attributes. Proc of the ACM SIGMOD Int Conf on Management of Data
Haas PJ, Swami AN (1992) Sequential Sampling Procedures for Query Size Estimation. Proc of the ACM SIGMOD Int Conf on Management of Data
Hinneburg A, Keim DA (1998) An Efficient Approach to Clustering in Large Multimedia Databases with Noise. Proc of the Int Conf on Knowledge Discovery and Data Mining
Ioannidis Y, Poosala V (1999) Histogram-Based Approximation of Set-Valued Query-Answers. Proc of the Int Conf on Very Large Data Bases
Lowe DG (1995) Similarity Metric Learning for a Variable-Kernel Classifier Neural Computation 7:72–95
Manku GS, Rajagopalan S, Lindsay BG (1998) Approximate Medians and other Quantiles in One Pass and with Limited Memory. Proc of the ACM SIGMOD Int Conf on Management of Data
McLachlan GJ (1992) Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York
Park BV, Turlach BA (1992) Practical performance of several data driven bandwidth selectors. Comput Stat 7:251–270
Poosala V, Ioannidis YE (1997) Selectivity Estimation Without the Attribute Value Independence Assumption. Proc of the Int Conf on Very Large Data Bases
Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan-Kaufmann
Scott D (1992) Multivariate Density Estimation: Theory, Practice and Visualization. Wiley, New York
Sain SR (1999) Multivariate Locally Adaptive Density Estimation. Technical Report, Department of Statistical Science, Southern Methodist University
Shanmugasundaram J, Fayyad U, Bradley P (1999) Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. Proc of the Int Conf on Knowl Discovery and Data Mining
Terrell GR, Scott DW (1992) Variable Kernel Density Estimation. Ann Stat 20:1236–1265
Vitter JS, Wang M, Iyer BR (1998) Data Cube Approximation and Histograms via Wavelets. Proc of the ACM CIKM Int Conf on Information and Knowledge Management
Wand MP, Jones MC (1995) Kernel Smoothing. Monographs on Statistics and Applied Probability. Chapman & Hall
Weber R, Schek HJ, Blott S (1998) A Quantitative Analysis and Performance Study for Similarity Search Methods in High-Dimensional Spaces. Proc of the Intern Conf on Very Large Data Bases
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Domeniconi, C., Gunopulos, D. An Efficient Density-based Approach for Data Mining Tasks. Know. Inf. Sys. 6, 750–770 (2004). https://doi.org/10.1007/s10115-003-0131-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-003-0131-8