Abstract
This paper proposes a clustering algorithm based on concept of rough computing and Entropy information to cluster objects into manageable smaller groups with similar characteristics or equivalence classes. The concept of rough computing is utilized for handling uncertainty associated with information ambiguity in clustering process. The Entropy information algorithm is employed to transform continuous data into categorical data. The proposed algorithm is capable to cluster different data types; different sources for both numerical and categorical data. The proposed algorithm is implemented and tested for a pharmaceutical company data set as a real case study. The clusters purity is used as a performance measure to evaluate the performance of clusters quality of the proposed algorithm. The comparison study verified that the proposed rough clustering algorithm based on entropy information has the highest clustering quality according to the purity and overall purity evaluation criteria.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Corchado, E., Herrero, A.: Neural visualization of network traffic data for intrusion detection. Applied Soft Computing (2010)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Sedano, J., Curiel, L., Corchado, E., de la Cal, E., Villar, J.R.: A soft computing method for detecting lifetime building thermal insulation failures. Integrated Computer-Aided Engineering 17(2), 103–115 (2010)
Chatzidimitriou, K.C., Symeonidis, A.L.: Data-Mining-Enhanced Agents in Dynamic Supply-Chain-Management Environments. IEEE Intelligent Systems 24(3), 54–63 (2009)
Lingras, P.: Applications of rough set based k-means, Kohonen SOM, GA clustering. Transactions on Rough Sets VII, 120–139 (2007)
Liu, H., Hussain, F., Lim Tan, C., Dash, M.: Discretization: An Enabling Technique. Journal of Data Mining and Knowledge Discovery 6(4), 393–423 (2002)
Cao, L., Gorodetsky, V., Mitkas, P.A.: Agent Mining: The Synergy of Agents and Data Mining. IEEE Intelligent Systems 24(3), 64–72 (2009)
Ngai, E.W.T., Xiu, L., Chau, D.C.K.: Application of data mining techniques in customer relationship management: A literature review and classification. Expert Systems with Applications 36(2), 2592–2602 (2009)
Palaniappan, S., Hong, T.K.: Discretization of Continuous Valued Dimensions in OLAP Data Cubes. IJCSNS International Journal of Computer Science and Network Security 8 (2008)
Parmar, D., Tong, W., Callerman, T., Fowler, J., Wolfe, P.: A Clustering Algorithm for Supplier Base Management. IEEE Transactions on Engineering Management 4 (2006)
Pawlak, Z.: Some Issues on Rough Sets. In: Peters, J.F., Skowron, A., Grzymała-Busse, J.W., Kostek, B.z., Świniarski, R.W., Szczuka, M.S. (eds.) Transactions on Rough Sets I. LNCS, vol. 3100, pp. 1–58. Springer, Heidelberg (2004)
Pawlak, Z.: Rough set approach to knowledge-based decision support. European Journal of Operational Research 99, 48–57 (1997)
Peters, G., Lampart, M., Weber, R.: Evolutionary Rough k-Medoid Clustering. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets VIII. LNCS, vol. 5084, pp. 289–306. Springer, Heidelberg (2008)
Rodriguez, C.: Computational environment for data preprocessing in supervised classification, Master’s Thesis, University of Puerto Rico, Mayaguez (2004)
Shin’ichi, S., Duy-Dinh, L.: Ent-Boost: Boosting Using Entropy Measure for Robust Object Detection. Pattern Recognition Letters 28, 1083–1098 (2007)
Tay, F., Shen, L.: A Modified Chi2 Algorithm for Discretization. IEEE Transactions on Knowledge and Data Engineering 14, 666–670 (2002)
Zhang, G.: A Remote Sensing Feature Discretization Method Accommodating Uncertainty in Classification Systems. In: Proceedings of the 8th International Conference on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Shanghai, China, June 25-27, pp. 195–202 (2008)
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning 55(3), 311–331 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Soliman, O.S., Hassanien, A.E., El-Bendary, N. (2011). A Rough Clustering Algorithm Based on Entropy Information. In: Corchado, E., Snášel, V., Sedano, J., Hassanien, A.E., Calvo, J.L., Ślȩzak, D. (eds) Soft Computing Models in Industrial and Environmental Applications, 6th International Conference SOCO 2011. Advances in Intelligent and Soft Computing, vol 87. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19644-7_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-19644-7_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19643-0
Online ISBN: 978-3-642-19644-7
eBook Packages: EngineeringEngineering (R0)