Optimal Bandwidth Selection for Density-Based Clustering

Jin, Hong; Wang, Shuliang; Zhou, Qian; Li, Ying

doi:10.1007/978-3-642-20244-5_15

Optimal Bandwidth Selection for Density-Based Clustering

Hong Jin²⁰,
Shuliang Wang^20,21,
Qian Zhou²¹ &
…
Ying Li²²

Conference paper

1716 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6637))

Abstract

Cluster analysis has long played an important role in a wide variety of data applications. When the clusters are irregular or intertwined, density-based clustering is proved to be much more efficient. The quality of clustering result depends on an adequate choice of the parameters. However, without enough domain knowledge the parameter setting is somewhat limited in its operability. In this paper, a new method is proposed to automatically find out the optimal parameter value of the bandwidth. It is to infer the most suitable parameter value by the constructed model on parameter estimation. Based on the Bayesian Theorem, from which the most probability value for the bandwidth can be acquired in accordance with the inherent distribution characteristics of the original data set. Clusters can then be identified by the determined parameter values. The results of the experiment show that the proposed method has complementary advantages in the density-based clustering algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ankerst, M., Breuing, M.M., Kriegel, H.P.: OPTICS: ordering points to identify the clustering structure. In: Proc. of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM Press, New York (1999)
Chapter Google Scholar
Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Proc of the 4th International Conference on Knowledge Discovery and Data mining, pp. 58–65. AAAI Press, Menlo Park (1998)
Google Scholar
George, K., Han, E.H., Kumar, V.: CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Computer 27(3), 329–341 (1999)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc.of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Menlo Park (1996)
Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, London (2006)
Google Scholar
Gentle, J.E.: Computational Statistics. Springer, New York (2001)
MATH Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
MATH Google Scholar
Gan, W.Y., Li, D.Y.: Hierarchical Clustering based on Kernel Density Estimation. Journal of System Simulation 16(2), 302–309 (2004)
Google Scholar
Dellaportas, P., Forster, J.J., Ntzourfras, I.: On Bayesian model and variable selection using MCMC. Statistic and Computing 12(2), 27–36 (2002)
Article MathSciNet MATH Google Scholar
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, 2nd edn. Chapman&Hall, London (2004)
MATH Google Scholar
Chen, M.H., Shao, Q.M., Ibrahim, J.G.: Monte Carlo Methods in Bayesian Computation. Springer, New York (2000)
Book MATH Google Scholar
Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Introducing Markov chain Monte Carlo. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.T. (eds.) Markov Chain Monte Carlo in Practice, pp. 1–19. Chapman and Hall, London (1996a)
Google Scholar
Terrell, G.R., Scott, D.W.: Variable kernel density estimation. Annals of Statistics (20), 1236–1265 (1992)
Google Scholar
Duong, T., Hazelton, M.L.: Plug-in Bandwidth Selectors for Bivariate Kernel Density Estimation. Journal of Nonparametric Statistics (15), 17–30 (2003)
Google Scholar
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, Visualization. Wiley, New York (1992)
Book MATH Google Scholar
Fang, M., Wang, S.L., Jin, H.: Spatial Neighborhood Clustering Based on Data Field. In: Cao, L., Feng, Y., Zhong, J. (eds.) ADMA 2010, Part I. LNCS, vol. 6440, pp. 262–269. Springer, Heidelberg (2010)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Software Engineering, Wuhan University, Wuhan, 430079, China
Hong Jin & Shuliang Wang
International School of Software, Wuhan University, Wuhan, 430079, China
Shuliang Wang & Qian Zhou
School of Mathematics and Statistics, Wuhan University, Wuhan, 430079, China
Ying Li

Authors

Hong Jin
View author publications
You can also search for this author in PubMed Google Scholar
Shuliang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qian Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ying Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, KLN, Hong Kong, China
Jianliang Xu
School of Information Science and Engineering, Northeastern University, Shenyang, 110004, Liaoning, China
Ge Yu
School of Computer Science, Fudan University, 220 Handan Road, 200433, Shanghai, China
Shuigeng Zhou
Institute for Computer Science and Business Information Systems (ICB), University of Duisburg-Essen, Schützenbahn 70, 45117, Essen, Germany
Rainer Unland

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, H., Wang, S., Zhou, Q., Li, Y. (2011). Optimal Bandwidth Selection for Density-Based Clustering. In: Xu, J., Yu, G., Zhou, S., Unland, R. (eds) Database Systems for Adanced Applications. DASFAA 2011. Lecture Notes in Computer Science, vol 6637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20244-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-20244-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20243-8
Online ISBN: 978-3-642-20244-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics