Skip to main content
Log in

An Efficient Density-based Approach for Data Mining Tasks

  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

We propose a locally adaptive technique to address the problem of setting the bandwidth parameters for kernel density estimation. Our technique is efficient and can be performed in only two dataset passes. We also show how to apply our technique to efficiently solve range query approximation, classification and clustering problems for very large datasets. We validate the efficiency and accuracy of our technique by presenting experimental results on a variety of both synthetic and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bennett KP, Fayyad U, Geiger D (1999) Density-Based Indexing for Approximate Nearest-Neighbor Queries. Proc of the Int Conf on Knowl Discovery and Data Mining

  2. Bradley PS, Fayyad U, Reina C (1998) Scaling Clustering Algorithms to Large Datasets. Proc of the Int Conf on Knowl Discovery and Data Mining

  3. Breiman L, Meisel W, Purcell E (1977) Variable Kernel Estimates of Multivariate Densities. Technometrics 13:135–144

    Google Scholar 

  4. Chakrabarti K, Garofalakis MN, Rastogi R, Shim K (2000) Approximate Query Processing Using Wavelets. Proc of the Int Conf on Very Large Data Bases

  5. Cressie NAC (1993) Statistics For Spatial Data. Wiley, New York

  6. Friedman JH, Fisher NI (1999) Bump Hunting in High-Dimensional Data. Stat Comput 9(2):123–143

    Article  MATH  Google Scholar 

  7. Gunopulos D, Kollios G, Tsotras V, Domeniconi C (2000) Approximating multi-dimensional aggregate range queries over real attributes. Proc of the ACM SIGMOD Int Conf on Management of Data

  8. Haas PJ, Swami AN (1992) Sequential Sampling Procedures for Query Size Estimation. Proc of the ACM SIGMOD Int Conf on Management of Data

  9. Hinneburg A, Keim DA (1998) An Efficient Approach to Clustering in Large Multimedia Databases with Noise. Proc of the Int Conf on Knowledge Discovery and Data Mining

  10. Ioannidis Y, Poosala V (1999) Histogram-Based Approximation of Set-Valued Query-Answers. Proc of the Int Conf on Very Large Data Bases

  11. Lowe DG (1995) Similarity Metric Learning for a Variable-Kernel Classifier Neural Computation 7:72–95

    Google Scholar 

  12. Manku GS, Rajagopalan S, Lindsay BG (1998) Approximate Medians and other Quantiles in One Pass and with Limited Memory. Proc of the ACM SIGMOD Int Conf on Management of Data

  13. McLachlan GJ (1992) Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York

  14. Park BV, Turlach BA (1992) Practical performance of several data driven bandwidth selectors. Comput Stat 7:251–270

    MATH  Google Scholar 

  15. Poosala V, Ioannidis YE (1997) Selectivity Estimation Without the Attribute Value Independence Assumption. Proc of the Int Conf on Very Large Data Bases

  16. Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan-Kaufmann

  17. Scott D (1992) Multivariate Density Estimation: Theory, Practice and Visualization. Wiley, New York

    Google Scholar 

  18. Sain SR (1999) Multivariate Locally Adaptive Density Estimation. Technical Report, Department of Statistical Science, Southern Methodist University

  19. Shanmugasundaram J, Fayyad U, Bradley P (1999) Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. Proc of the Int Conf on Knowl Discovery and Data Mining

  20. Terrell GR, Scott DW (1992) Variable Kernel Density Estimation. Ann Stat 20:1236–1265

    MATH  Google Scholar 

  21. Vitter JS, Wang M, Iyer BR (1998) Data Cube Approximation and Histograms via Wavelets. Proc of the ACM CIKM Int Conf on Information and Knowledge Management

  22. Wand MP, Jones MC (1995) Kernel Smoothing. Monographs on Statistics and Applied Probability. Chapman & Hall

  23. Weber R, Schek HJ, Blott S (1998) A Quantitative Analysis and Performance Study for Similarity Search Methods in High-Dimensional Spaces. Proc of the Intern Conf on Very Large Data Bases

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlotta Domeniconi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Domeniconi, C., Gunopulos, D. An Efficient Density-based Approach for Data Mining Tasks. Know. Inf. Sys. 6, 750–770 (2004). https://doi.org/10.1007/s10115-003-0131-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-003-0131-8

Keywords

Navigation