An Efficient Density-based Approach for Data Mining Tasks

Domeniconi, Carlotta; Gunopulos, Dimitrios

doi:10.1007/s10115-003-0131-8

An Efficient Density-based Approach for Data Mining Tasks

Published: 06 February 2004

Volume 6, pages 750–770, (2004)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Carlotta Domeniconi¹ &
Dimitrios Gunopulos²

66 Accesses
5 Citations
Explore all metrics

Abstract

We propose a locally adaptive technique to address the problem of setting the bandwidth parameters for kernel density estimation. Our technique is efficient and can be performed in only two dataset passes. We also show how to apply our technique to efficiently solve range query approximation, classification and clustering problems for very large datasets. We validate the efficiency and accuracy of our technique by presenting experimental results on a variety of both synthetic and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article Open access 06 November 2019

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

K-Means algorithm based on multi-feature-induced order

Article 09 April 2024

References

Bennett KP, Fayyad U, Geiger D (1999) Density-Based Indexing for Approximate Nearest-Neighbor Queries. Proc of the Int Conf on Knowl Discovery and Data Mining
Bradley PS, Fayyad U, Reina C (1998) Scaling Clustering Algorithms to Large Datasets. Proc of the Int Conf on Knowl Discovery and Data Mining
Breiman L, Meisel W, Purcell E (1977) Variable Kernel Estimates of Multivariate Densities. Technometrics 13:135–144
Google Scholar
Chakrabarti K, Garofalakis MN, Rastogi R, Shim K (2000) Approximate Query Processing Using Wavelets. Proc of the Int Conf on Very Large Data Bases
Cressie NAC (1993) Statistics For Spatial Data. Wiley, New York
Friedman JH, Fisher NI (1999) Bump Hunting in High-Dimensional Data. Stat Comput 9(2):123–143
Article MATH Google Scholar
Gunopulos D, Kollios G, Tsotras V, Domeniconi C (2000) Approximating multi-dimensional aggregate range queries over real attributes. Proc of the ACM SIGMOD Int Conf on Management of Data
Haas PJ, Swami AN (1992) Sequential Sampling Procedures for Query Size Estimation. Proc of the ACM SIGMOD Int Conf on Management of Data
Hinneburg A, Keim DA (1998) An Efficient Approach to Clustering in Large Multimedia Databases with Noise. Proc of the Int Conf on Knowledge Discovery and Data Mining
Ioannidis Y, Poosala V (1999) Histogram-Based Approximation of Set-Valued Query-Answers. Proc of the Int Conf on Very Large Data Bases
Lowe DG (1995) Similarity Metric Learning for a Variable-Kernel Classifier Neural Computation 7:72–95
Google Scholar
Manku GS, Rajagopalan S, Lindsay BG (1998) Approximate Medians and other Quantiles in One Pass and with Limited Memory. Proc of the ACM SIGMOD Int Conf on Management of Data
McLachlan GJ (1992) Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York
Park BV, Turlach BA (1992) Practical performance of several data driven bandwidth selectors. Comput Stat 7:251–270
MATH Google Scholar
Poosala V, Ioannidis YE (1997) Selectivity Estimation Without the Attribute Value Independence Assumption. Proc of the Int Conf on Very Large Data Bases
Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan-Kaufmann
Scott D (1992) Multivariate Density Estimation: Theory, Practice and Visualization. Wiley, New York
Google Scholar
Sain SR (1999) Multivariate Locally Adaptive Density Estimation. Technical Report, Department of Statistical Science, Southern Methodist University
Shanmugasundaram J, Fayyad U, Bradley P (1999) Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. Proc of the Int Conf on Knowl Discovery and Data Mining
Terrell GR, Scott DW (1992) Variable Kernel Density Estimation. Ann Stat 20:1236–1265
MATH Google Scholar
Vitter JS, Wang M, Iyer BR (1998) Data Cube Approximation and Histograms via Wavelets. Proc of the ACM CIKM Int Conf on Information and Knowledge Management
Wand MP, Jones MC (1995) Kernel Smoothing. Monographs on Statistics and Applied Probability. Chapman & Hall
Weber R, Schek HJ, Blott S (1998) A Quantitative Analysis and Performance Study for Similarity Search Methods in High-Dimensional Spaces. Proc of the Intern Conf on Very Large Data Bases

Download references

Author information

Authors and Affiliations

Information and Software Engineering Department, George Mason University, Fairfax, VA, 22030, USA
Carlotta Domeniconi
Computer Science Department, University of California, Riverside, CA, USA
Dimitrios Gunopulos

Authors

Carlotta Domeniconi
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Gunopulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlotta Domeniconi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Domeniconi, C., Gunopulos, D. An Efficient Density-based Approach for Data Mining Tasks. Know. Inf. Sys. 6, 750–770 (2004). https://doi.org/10.1007/s10115-003-0131-8

Download citation

Received: 26 January 2002
Revised: 20 December 2002
Accepted: 23 April 2003
Published: 06 February 2004
Issue Date: November 2004
DOI: https://doi.org/10.1007/s10115-003-0131-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient Density-based Approach for Data Mining Tasks

Abstract

Access this article

Similar content being viewed by others

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Clustering graph data: the roadmap to spectral techniques

K-Means algorithm based on multi-feature-induced order

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Efficient Density-based Approach for Data Mining Tasks

Abstract

Access this article

Similar content being viewed by others

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Clustering graph data: the roadmap to spectral techniques

K-Means algorithm based on multi-feature-induced order

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation