Abstract
Soft and hard clustering efficiency evaluation of novel approach of frequent pattern growth based fuzzy particle swarm optimization for clustering web documents is studied and analyzed in this paper. The conventional approaches K-Means and Fuzzy c-means (FCM) fails with regard to random initialization and local minima hookups. To overcome this drawbacks, bio inspired mechanisms like genetic algorithm, ant colony optimization and particle swarm optimization (PSO) are used to optimize the K-means and FCM clustering. The major contribution of the novel method are three fold. Primarily in its ways to automatically find effective cluster numbers, cluster centroids and swarms for the bio inspired fuzzy particle swarm optimization. Second in yielding fuzzy overlapping clusters using the FCM objective function overcoming the drawbacks of the existing methods. Third, the methodology discusses in this paper prunes out the irrelevant elements from the search space and thereby retains all relationships with search query as semantic conditionally relatable sets. The evaluation results show that our proposed approach performs better for Adjusted Rand Index (ARI), Normalized Mutual Information (NMI) and Adjusted Concordance Index (ACI) against various distance based similarity measures and FCMPSO.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. 3, 32–57 (1973)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell (1981)
Liu, H., Pei, T., Zhou, T., Zhu, A.X.: Multi-temporal MODIS-data-based PSO-FCM clustering applied to wetland extraction in the Sanjiang Plain. In: International Conference on Earth Observation Data Processing and Analysis, Wuhan, China, vol. 7285 (2008)
Silva Filho, T.M., Pimentel, B.A., Souza, R.M.C.R., Oliveira, A.L.I.: Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization. Expert Syst. Appl. 42(17–18), 6315–6328 (2015)
Lam, Y.-K., Tsang, P.W.M., Leung, C.-S.: PSO-based K-Means clustering with enhanced cluster matching for gene expression data. Neural Comput. Appl. 22(7–8), 1349–1355 (2013)
Feng, Y., Teng, G.F., Wang, A.X., Yao, Y.M.: Chaotic inertia weight in particle swarm optimization. In: Second International Conference on Innovative Computing, Information and Control, pp. 475–501. IEEE (2008)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov. 8, 53–87 (2004)
Izakian, H., Abraham, A.: Fuzzy C-means and fuzzy swarm for fuzzy clustering problem. Expert Syst. Appl. 38(3), 1835–1838 (2011)
Kennedy, J.F., Eberhart, R.C., Shi, Y., NetLibrary, Inc.: Swarm Intelligence. Morgan Kaufmann Publishers, San Francisco (2001)
Pamba, R.V., Sherly, E., Mohan, K.: Automated information retrieval model using FP growth based fuzzy particle swarm optimization. Int. J. Comput. Sci. Inf. Technol. 9(1) (2017)
Priyadharshini, S.P., Pujeri, R.V.: Performance analysis of fuzzy clustering. Int. J. Adv. Eng. Technol. (2014)
Zheng, Y., Qu, J., Zhou, Y.: An improved PSO clustering algorithm based on affinity propagation. WSEAS Trans. Syst. 12(9), 447–456 (2013)
Huang, H.-C., Chuang, Y.-Y., Chen, C.-S.: Multiple kernel fuzzy clustering. IEEE Trans. Fuzzy Syst. 20(1), 120–134 (2012)
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010). Elsevier
Cui, X., Potok, T.E.: Document clustering analysis based on hybrid PSO+Kmeans algorithm. J. Comput. Sci. 27–33 (2005). Special Issue
Wu, J., Xiong, H., Chen, J.: Adapting the right measures for k-means clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, ser. KDD 2009, pp. 877–886 (2009)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)
Amodio, S., d’Ambrosio, A., Iorio, C., Siciliano, R.: Adjusted concordance index, an extension of the adjusted rand index to fuzzy partitions. STAD Research report 03 2015 (2016)
Campello, R.J.G.B.: A fuzzy extension of the rand index and other related indexes for clustering and classification assessment. Pattern Recogn. Lett. 28(7), 833–841 (2007)
Hullermeier, E., Rifqi, M., Henzgen, S., Senge, R.: Comparing fuzzy partitions: a generalization of the rand index and related measures. IEEE Trans. Fuzzy Syst. 20(3), 546–556 (2012)
Yates, R.B., Neto, B.R.: Modern Information Retrieval. Addison-Wesley, New York (1999)
Cardoso-Cachopo, A.: Datasets for single-label text categorization. http://web.ist.utl.pt/acardoso/
Labatut, V.: Generalized measures for the evaluation of community detection methods. https://arxiv.org/ftp/arxiv/papers/1303/1303.5441.pdf
Larsen, B., Aone, C.,: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999)
Alok, A.K., Saha, S., Ekbal, A.: Development of an external cluster validity index using probabilistic approach and min-max distance. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 6, 494–504 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Pamba, R.V., Sherly, E., Mohan, K. (2017). Evaluation of Frequent Pattern Growth Based Fuzzy Particle Swarm Optimization Approach for Web Document Clustering. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10404. Springer, Cham. https://doi.org/10.1007/978-3-319-62392-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-62392-4_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62391-7
Online ISBN: 978-3-319-62392-4
eBook Packages: Computer ScienceComputer Science (R0)