ABSTRACT
Data Mining is the process of extracting useful and accurate information or patterns from large databases using different algorithms and methods of machine learning. To analyze the data, Clustering is one of the methods in which similar data is grouped together and DBSCAN clustering algorithm is the one, which is broadly used in numerous practical applications. This paper presents a more efficient density based clustering algorithm, which has the ability to discover cluster faster than the existing DBSCAN algorithm. The efficiency is achieved by restricting the randomness of choosing points from the dataset. Our proposed algorithm named Restricted Randomness DBSCAN (RR DBSCAN) is compared with conventional DBSCAN algorithm over 9 datasets on the basis of Silhouette Coefficient, Time taken in formation of clusters and accuracy. The results show that RR DBSCAN performs better than traditional DBSCAN in terms of accuracy and time taken to form clusters.
- [1] Phyu, Thair Nu, ”Survey of classification techniques in data mining.” Proceedings of the International MultiConference of Engineers and Computer Scientists. Vol. 1. 2009.Google Scholar
- [2] Kotsiantis, Sotiris, and Dimitris Kanellopoulos, ”Association rules mining: A recent overview.” GESTS International Transactions on Computer Science and Engineering 32.1 (2006): 71-82.Google Scholar
- [3] Berkhin, Pavel. ”A survey of clustering data mining techniques.” Grouping multidimensional data. Springer, Berlin, Heidelberg, 2006. 25-71.Google ScholarCross Ref
- [4] McCarty, John A., and Manoj Hastak. ”Segmentation approaches in data-mining: A comparison of RFM, CHAID, and logistic regression.” Journal of business research 60.6 (2007): 656-662.Google ScholarCross Ref
- [5] Kamalpreet K. Jassar, Kanwalvir S. Dhindsa, ”Comparative study of spatial data mining techniques” International Journal of computer applications, 2015.Google Scholar
- [6] X.Y. Wang, J.M. Garibaldi, ”A comparison of fuzzy and non-fuzzy clustering techniques in cancer diagnosis”The University of Nottingham.Google Scholar
- [7] Ting Liu, Charles Rosenberg, henry A. Rowley, ”Clustering billions of images with large scale nearest neighbor search”, IEEE workshop on applications ofComputer Vision, 2007.Google Scholar
- [8] Kemal Akkaya, Fatih Senel, Brian McLaughlan, “Clustering of Wireless Sensor and actor networks based on sensor distribution and inter-actor connectivity”. Journal of Parallel and Distributed Computing 69.6 (2009): 573-587.Google ScholarDigital Library
- [9] Koutsoukas, Alexios, et al. ”From in silico target prediction to multi-target drug design: current databases, methods and applications.” Journal of proteomics 74.12 (2011): 2554-2574.Google ScholarCross Ref
- [10] Johnson, Stephen C. ”Hierarchical clustering schemes.” Psychometrika 32.3 (1967): 241-254.Google ScholarCross Ref
- [11] Sousa, Lúcia & Gama, João. The Application of Hierarchical Clustering Algorithms for Recognition Using Biometrics of the Hand. International Journal of Advanced Engineering Research and Science (IJAERS). ISSN: 2349-6495.Google Scholar
- [12] Martin Ester, Hans P. Kriegel, J. Sander, Xiaowei Xu, ”A density based algorithm for discovering clusters in large spatial database with noise”, KDD-96.Google Scholar
- [13] Fu X., Wang Y., Ge Y., Chen P., Teng S. Research and Application of DBSCAN Algorithm Based on Hadoop Platform. In: Zu Q., Vargas-Vera M., Hu B. (eds) Pervasive Computing and the Networked World. ICPCA/SWS 2013. Lecture Notes in Computer Science, vol 8351. Springer, Cham.Google Scholar
- [14] Pappas, Thrasyvoulos N. ”An adaptive clustering algorithm for image segmentation.” IEEE Transactions on signal processing40.4 (1992): 901-914.Google ScholarDigital Library
- [15] Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. ”Automatic subspace clustering of high dimensional data for data mining applications” (Vol. 27, No. 2, pp. 94-105). ACM.Google Scholar
- [16] Hartigan, John A., and Manchek A. Wong. ”Algorithm AS 136: A k-means clustering algorithm.” Journal of the Royal Statistical Society. Series C (Applied Statistics) 28.1 (1979): 100-108.Google Scholar
- [17] Ray, Siddheswar, and Rose H. Turi. ”Determination of number of clusters in k-means clustering and application in colour image segmentation.” Proceedings of the 4th international conference on advances in pattern recognition and digital techniques. 1999.Google Scholar
- [18] J.Sander, “Density Based Clustering ”,in Enclyclopedia of machine learning, Springer, 2011, pp.270-273.Google Scholar
- [19] Ankita, Thakur Manish K. ”Modified DBSCAN Using Particle Swarm Optimization for Spatial Hotspot Identification.” 2018 Eleventh International Conference on Contemporary Computing (IC3). IEEE, 2018.Google Scholar
- [20] Martino, F. D., and Sessa, S., “A fuzzy particle swarm optimization algorithm and its application to hotspot events in spatial analysis,” J Ambient Intell Human Comput, Springer, 2013.Google ScholarCross Ref
- [21] Smiti, Abir, and Zied Eloudi. ”Soft dbscan: Improving dbscan clustering method using fuzzy set theory.” 2013 6th International Conference on Human System Interactions (HSI). IEEE, 2013.Google Scholar
- [22] Ienco, Dino, and Gloria Bordogna. ”Fuzzy extensions of the DBScan clustering algorithm.” Soft Computing 22.5 (2018): 1719-1730.Google ScholarDigital Library
- [23] Viswanath, P., and Rajwala Pinkesh. ”l-dbscan: A fast hybrid density based clustering method.” 18th International Conference on Pattern Recognition (ICPR’06). Vol. 1. IEEE, 2006.Google ScholarDigital Library
- [24] Liu, Bing. ”A fast density-based clustering algorithm for large databases.” 2006 International Conference on Machine Learning and Cybernetics. IEEE, 2006.Google ScholarCross Ref
- [25] Aranganayagi, S., and K. Thangavel. ”Clustering categorical data using silhouette coefficient as a relocating measure.” International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007). Vol. 2. IEEE, 2007.Google ScholarDigital Library
Recommendations
K-DBSCAN: An improved DBSCAN algorithm for big data
AbstractBig data storage and processing are among the most important challenges now. Among data mining algorithms, DBSCAN is a common clustering method. One of the most important drawbacks of this algorithm is its low execution speed. This study aims to ...
AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities
Clustering is a typical data mining technique that partitions a dataset into multiple subsets of similar objects according to similarity metrics. In particular, density-based algorithms can find clusters of different shapes and sizes while remaining ...
Rough-DBSCAN: A fast hybrid density based clustering method for large data sets
Density based clustering techniques like DBSCAN are attractive because it can find arbitrary shaped clusters along with noisy outliers. Its time requirement is O(n^2) where n is the size of the dataset, and because of this it is not a suitable one to ...
Comments