PHD: an efficient data clustering scheme using partition space technique for knowledge discovery in large databases

Tsai, Cheng-Fa; Yeh, Heng-Fu; Chang, Jui-Fang; Liu, Ning-Han

doi:10.1007/s10489-010-0239-y

PHD: an efficient data clustering scheme using partition space technique for knowledge discovery in large databases

Published: 23 June 2010

Volume 33, pages 39–53, (2010)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Cheng-Fa Tsai¹,
Heng-Fu Yeh¹,
Jui-Fang Chang² &
…
Ning-Han Liu¹

236 Accesses
10 Citations
Explore all metrics

Abstract

Rapid technological advances imply that the amount of data stored in databases is rising very fast. However, data mining can discover helpful implicit information in large databases. How to detect the implicit and useful information with lower time cost, high correctness, high noise filtering rate and fit for large databases is of priority concern in data mining, specifying why considerable clustering schemes have been proposed in recent decades. This investigation presents a new data clustering approach called PHD, which is an enhanced version of KIDBSCAN. PHD is a hybrid density-based algorithm, which partitions the data set by K-means, and then clusters the resulting partitions with IDBSCAN. Finally, the closest pairs of clusters are merged until the natural number of clusters of data set is reached. Experimental results reveal that the proposed algorithm can perform the entire clustering, and efficiently reduce the run-time cost. They also indicate that the proposed new clustering algorithm conducts better than several existing well-known schemes such as the K-means, DBSCAN, IDBSCAN and KIDBSCAN algorithms. Consequently, the proposed PHD algorithm is efficient and effective for data clustering in large databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Density-Based Clustering Using Automatic Parameter Detection

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

Article 08 May 2018

Jeong-Hun Kim, Jong-Hyeok Choi, … Aziz Nasridinov

Constraint-Based Clustering Algorithm for Multi-density Data and Arbitrary Shapes

References

Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 94–105
Borah B, Bhattacharyya DK (2004) An improved sampling-based DBSCAN for large spatial databases. In: Proceedings of international conference on intelligent sensing and information processing, pp 92–96
Breitenbach M, Grudic GZ (2005) Clustering through ranking on manifolds. In: Proceedings of the 22nd international conference on machine learning, pp 73–80
Chen Y, Rege M, Dong M, Hua J (2008) Non-negative matrix factorization for semi-supervised data clustering. Knowl Inf Syst 17(3):355–379
Article Google Scholar
Cheng H, Hua KA, Vu K (2008) Constrained locally weighted clustering. In: Proceedings of the VLDB endowment, pp 90–101
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 226–231
Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recogn 41:176–190
Article MATH Google Scholar
Fisher R (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Google Scholar
Guha S, Rastogi R, Shim K (1998) CURE: An efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 73–84
Guha S, Rastogi R, Shim K (1999) ROCK: A robust clustering algorithm for categorical attributes. In: Proceedings of the 15th international conference on data engineering, pp 512–521
Karypis G, Han EH, Kumar V (1999) CHAMELEON: A hierarchical clustering using dynamic modeling. IEEE Comput 32(8):68–75
Google Scholar
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297
Tsai C-F, Liu C-W (2006) KIDBSCAN: A new efficient data clustering algorithm for data mining in large databases. Lect Notes Comput Sci (LNCS) 4029:702–711
Article Google Scholar
Tsai C-F, Yen C-C (2007) ANGEL: A new effective and efficient hybrid clustering technique for large databases. Lect Notes Comput Sci (LNCS) 4426:817–824
Article Google Scholar
equation:UCI Repository. http://www.sgi.com/tech/mlc/db/
Wang T-P, Tsai C-F (2006) GDH: An effective and efficient approach to detect arbitrary patterns in clusters with noises in very large databases. Master thesis, National Pingtung University of Science and Technology, Taiwan
Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Article Google Scholar
Zhang T, Ramakrishnan R (1996) BIRCH: An efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 103–114

Download references

Author information

Authors and Affiliations

Department of Management Information Systems, National Pingtung University of Science and Technology, 91201, Pingtung, Taiwan
Cheng-Fa Tsai, Heng-Fu Yeh & Ning-Han Liu
Department of International Business, National Kaohsiung University of Applied Sciences, 80778, Kaohsiung, Taiwan
Jui-Fang Chang

Authors

Cheng-Fa Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Heng-Fu Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Jui-Fang Chang
View author publications
You can also search for this author in PubMed Google Scholar
Ning-Han Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cheng-Fa Tsai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsai, CF., Yeh, HF., Chang, JF. et al. PHD: an efficient data clustering scheme using partition space technique for knowledge discovery in large databases. Appl Intell 33, 39–53 (2010). https://doi.org/10.1007/s10489-010-0239-y

Download citation

Published: 23 June 2010
Issue Date: August 2010
DOI: https://doi.org/10.1007/s10489-010-0239-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PHD: an efficient data clustering scheme using partition space technique for knowledge discovery in large databases

Abstract

Access this article

Similar content being viewed by others

Efficient Density-Based Clustering Using Automatic Parameter Detection

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

Constraint-Based Clustering Algorithm for Multi-density Data and Arbitrary Shapes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PHD: an efficient data clustering scheme using partition space technique for knowledge discovery in large databases

Abstract

Access this article

Similar content being viewed by others

Efficient Density-Based Clustering Using Automatic Parameter Detection

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

Constraint-Based Clustering Algorithm for Multi-density Data and Arbitrary Shapes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation