A novel DBSCAN with entropy and probability for mixed data

Liu, Xingxing; Yang, Qing; He, Ling

doi:10.1007/s10586-017-0818-3

A novel DBSCAN with entropy and probability for mixed data

Published: 17 March 2017

Volume 20, pages 1313–1323, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Xingxing Liu¹,
Qing Yang¹ &
Ling He¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

In big data situation, to detect clusters of different size and shape is a challenging and imperative task. Density based clustering approaches have been widely used in many areas of science due to its simplicity and the ability to detect clusters of different sizes and shapes over the last several years. With diverse conversion on categorical data, a modified version of the DBSCAN algorithm is proposed to cluster mixed data, noted as density based clustering algorithm for mixed data with integration of entropy and probability distribution (EPDCA). Optional and various conversions are provided for clustering process with adaptability. Some benchmark data sets from UCI have been selected for testing the capability and validity of EPDCA. It was shown that the clustering results of EPDCA are considerably improved, especially on automatically number of clusters formed, noise discovery and time elapsed to form clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Density-Based Clustering Using Automatic Parameter Detection

VDMR-DBSCAN: Varied Density MapReduce DBSCAN

Merging DBSCAN and Density Peak for Robust Clustering

References

Hsu, C.C., Huang, Y.P.: Incremental clustering of mixed data based on distance hierarchy. Expert Syst. Appl. 35(3), 1177–1185 (2008)
Article Google Scholar
Zhang, X., Wu, Y., Zhao, C.: MrHeter: improving MapReduce performance in heterogeneous environments. Clust. Comput. 19, 1691–1701 (2016)
Article Google Scholar
Kaur, A., Datta, A.: A novel algorithm for fast and scalable subspace clustering of high-dimensional data. J. Big Data 2(1), 1–24 (2015)
Article Google Scholar
Dutta, D., Dutta, P., Sil, J.: Simultaneous feature selection and clustering with mixed features by multi objective genetic algorithm. Int. J. Hybrid Intell. Syst. 11(1), 41–54 (2014)
Article Google Scholar
Sakr, S.: Cloud-hosted databases: technologies, challenges and opportunities. Clust. Comput. 17(2), 87–502 (2014)
Article Google Scholar
Chang, C.S., Liao, W., Chen, Y.S., et al.: A mathematical theory for clustering in metric spaces. IEEE Trans. Netw. Sci. Eng. 3(1), 2–16 (2016)
Article MathSciNet Google Scholar
Parameswari, P., Samath, J.A., Saranya, P.: Efficient birch clustering algorithm for categorical and numerical data using modified co-occurrence method. Int. J. Appl. Eng. Res. 10(11), 27661–27673 (2015)
Google Scholar
Jalal, A.S., Anant, R., Sunita, J., et al.: A density based algorithm for discovering density varied clusters in large spatial databases. Int. J. Comput. Appl. 3(6), 1–4 (2010)
Google Scholar
Lee, J., Lee, Y.J.: An effective dissimilarity measure for clustering of high-dimensional categorical data. Knowl. Inf. Syst. 38(3), 743–757 (2014)
Article Google Scholar
Cao, F., Liang, J., Li, D., et al.: A dissimilarity measure for the k-modes clustering algorithm. Knowl. Based Syst. 26(9), 120–127 (2011)
Google Scholar
Ji, J., Pang, W., Zheng, Y., et al.: A novel cluster center initialization method for the k-prototypes algorithms using centrality and distance. Appl. Math. Inf. Sci. 9(6), 2933–2942 (2015)
Google Scholar
Lee, M., Pedrycz, W.: The fuzzy C-means algorithm with fuzzy P-mode prototypes for clustering objects having mixed features. Fuzzy Sets Syst. 160(24), 3590–3600 (2009)
Article MathSciNet MATH Google Scholar
Sander, J., Ester, M., Kriegel, H.P., et al.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining Knowl. Discov. 2(2), 169–194 (1998)
Article Google Scholar
Tran, T.N., Wehrens, R., Buydens, L.M.C.: KNN-kernel density-based clustering for high-dimensional multivariate data. Comput. Stat. Data Anal. 51(2), 513–525 (2006)
Article MathSciNet MATH Google Scholar
Hinneburg, A., Keim, D.A.: A general approach to clustering in large databases with noise. Knowl. Inf. Syst. 5(4), 387–415 (2003)
Article Google Scholar
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Article Google Scholar
Sugiyama, M., Niu, G., Yamada, M., et al.: Information-maximization clustering based on squared-loss mutual information. Neural Comput. 26(1), 84–131 (2014)
Article MathSciNet Google Scholar
Tran, T.N., Drab, K., Daszykowski, M.: Revised DBSCAN algorithm to cluster data with dense adjacent clusters. Chemom. Intell. Lab. Syst. 120(2), 92–96 (2013)
Article Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. Inf. Syst. 25(5), 345–366 (2001)
Article Google Scholar
Maulik, U., Bandyopadhyay, S., Saha, I.: Integrating clustering and supervised learning for categorical data analysis. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(4), 664–675 (2010)
Article Google Scholar
Ahmad, A., Dey, L.: A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recognit. Lett. 28(1), 110–118 (2007)
Article Google Scholar
Lin, J., Lin, H.: A density-based clustering over evolving heterogeneous data stream. Int. J. Digit. Content Technol. Appl. 5(6), 275–277 (2009)
Google Scholar
Webb, J.A., Bond, N.R., Wealands, S.R., et al.: Bayesian clustering with AutoClass explicitly recognises uncertainties in landscape classification. Ecography 30(4), 526–536 (2007)
Article Google Scholar
Li, C., Biswas, G.: Unsupervised learning with mixed numeric and nominal data. IEEE Trans. Knowl. Data Eng. 14(4), 673–690 (2002)
Article Google Scholar
Xu, Z., Luo, X., Yu, J., Xu, W.: Measuring semantic similarity between words by removing noise and redundancy in web snippets. Concurr. Comput. 23(18), 2496–2510 (2011)
Article Google Scholar
Wikaisuksakul, S.: A multi-objective genetic algorithm with fuzzy c-means for automatic data clustering. Appl. Soft Comput. 24, 679–691 (2014)
Article Google Scholar
Capitaine, H.L., Frelicot, C.: A cluster-validity index combining an overlap measure and a separation measure based on fuzzy–aggregation operators. IEEE Trans. Fuzzy Syst. 19(3), 580–588 (2011)
Article Google Scholar
Xu, Z., Luo, X., Mei, L., Hu, C.: Measuring the semantic discrimination capability of association relations. Concurr. Comput. 26(2), 380–395 (2014)
Article Google Scholar
Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63(2), 503–527 (2007)
Article Google Scholar
Zheng Z, Gong M, Ma J, et al: Unsupervised evolutionary clustering algorithm for mixed type data. In: IEEE Congress on Evolutionary Computation, pp. 1–8 (2009)
Liu, W., Luo, X., Gong, Z., Xuan, J., Kou, N., Xu, Z.: Discovering the core semantics of event from social media. Future Gener. Comput. Syst. 64, 175–185 (2016)
Article Google Scholar
Hsu, C.C., Chen, Y.C.: Mining of mixed data with application to catalog marketing. Expert Syst. Appl. 32(1), 12–23 (2007)
Article Google Scholar
Chao, J., Pang, W., Zhou, C.G.: An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing 120(1), 590–596 (2013)
Google Scholar

Download references

Acknowledgements

The authors are very grateful to the editors and reviewers for their valuable comments and suggestions. This work was supported in part by the Major Projects of the National Social Science Fund of China (Grant Nos. 16ZDA045 and 15ZDB168), National Natural Science Foundation of China (Grant Nos. 71603197, 71371148, and 91024020), Junior Fellowships for CAST Advanced Innovation Think-tank Program (DXB-ZKQN-2016-013), China Postdoctoral Science Foundation Funded Project (2016M592403).

Author information

Authors and Affiliations

Wuhan University of Technology, Wuhan, Hubei, China
Xingxing Liu, Qing Yang & Ling He

Authors

Xingxing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ling He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qing Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Yang, Q. & He, L. A novel DBSCAN with entropy and probability for mixed data. Cluster Comput 20, 1313–1323 (2017). https://doi.org/10.1007/s10586-017-0818-3

Download citation

Received: 04 November 2016
Revised: 27 February 2017
Accepted: 04 March 2017
Published: 17 March 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10586-017-0818-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel DBSCAN with entropy and probability for mixed data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Density-Based Clustering Using Automatic Parameter Detection

VDMR-DBSCAN: Varied Density MapReduce DBSCAN

Merging DBSCAN and Density Peak for Robust Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now