Abstract
Text mining, in particular the clustering is mostly used by search engines to increase the recall and precision of a search query. The content of online websites (text, blogs, chats, news, etc.) are dynamically updated, nevertheless relevant information on the changes made are not present. Such a scenario requires a dynamic text clustering method that operates without initial knowledge on a data collection. In this paper, a dynamic text clustering that utilizes Firefly algorithm is introduced. The proposed, aFAmerge, clustering algorithm automatically groups text documents into the appropriate number of clusters based on the behavior of firefly and cluster merging process. Experiments utilizing the proposed aFAmerge were conducted on two datasets; 20Newsgroups and Reuter’s news collection. Results indicate that the aFAmerge generates a more robust and compact clusters than the ones produced by Bisect K-means and practical General Stochastic Clustering Method (pGSCM).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sayed, A., Hacid, H., Zighed, D.: Exploring validity indices for clustering textual data. In: Zighed, D.A., Tsumoto, S., Ras, Z.W., Hacid, H. (eds.) Mining Complex Data. Studies in Computational Intelligence, vol. 165, pp. 281–300. Springer, Heidelberg (2009)
Miner, G., Elder, J., Fast, A., Hill, T., Nisbet, R., Delen, D.: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications, 1st edn. Elsevier, Amsterdam (2012)
Zhang, L., Cao, Q., Lee, J.: A novel ant-based clustering algorithm using Renyi entropy. Appl. Soft Comput. 13(5), 2643–2657 (2013)
Murugesan, K, Zhang, J.: Hybrid bisect K-means clustering algorithm. In: IEEE International Conference on Business Computing and Global Informatization (BCGIN), pp. 216–219. IEEE (2011)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: proceedings of KDD Workshop on Text Mining, Boston (2000)
Tan, S.C., Ting, K.M., Teng, S.W.: A general stochastic clustering method for automatic cluster discovery. Pattern Recogn. 44(10–11), 2786–2799 (2011)
Feng, L., Qiu, M.H., Wang, Y.X., Xiang, Q.L., Yang, Y.F.: Fast divisive clustering algorithm using an improved discrete particle swarm optimizer. Pattern Recogn. Lett. 31, 1216–1225 (2010)
Kashef, R., Kamel, M.S.: Enhanced bisecting K-means clustering using intermediate cooperation. Pattern Recogn. 42(11), 2557–2569 (2009)
Yujian, L., Liye, X.: Unweighted multiple group method with arithmetic mean. In: the IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA), pp. 830–834 (2010)
Yin, Y., Kaku, I., Tang, J., Zhu, J.: Data Mining Concepts, Methods and Application in Management and Engineering Design. Springer, London (2011)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, New York, Addition Wesley, Boston (2006)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 1st edn. Cambridge University Press, Cambridge (2008)
Gil-Garicia, R., Pons-Porrata, A.: Dynamic hierarchical algorithms for document clustering. Pattern Recogn. Lett. 31(6), 469–477 (2010)
Picarougne, F., Azzag, H., Venturini, G., Guinot, C.: A new approach of data clustering using a flock of agents. Evol. Comput. 15(3), 345–367 (2007)
Tan, S.C., Ting, K.M., Teng, S.W.: Simplifying and improving ant-based clustering. Procedia Comput. Sci. 4, 46–55 (2011)
Yang, X.S.: Firefly algorithm, stochastic test functions and design optimization. Int. J. Bio-Inspired Comput. 2(2), 78–84 (2010)
Yang, X.S., He, X.: Firefly algorithm: recent advances and applications. Int. J. Swarm Intell. 1(1), 36–50 (2013)
Mohammed, A.J., Yusof, Y., Husni, H.: Document clustering based on firefly algorithm. J. Comput. Sci. 11(3), 453–465 (2015)
Newsgroup Data Set (2006). http://people.csail.mit.edu/20Newsgroup/
Lewis, D.: The reuters-21578 text categorization test collection (1999). http://kdd.ics.uci.edu/database/reuters21578/reuters21578.html
Forsati, R., Mahdavi, M., Shamsfard, M., Meybodi, M.R.: Efficient stochastic algorithms for document clustering. Inf. Sci. 220, 269–291 (2013)
Hatamlou, A., Abdullah, S., Nezamabadi-pour, H.: A combined approach for clustering based on K-means and gravitational search algorithms. Swarm Evol. Comput. 6, 47–52 (2012)
Yang, X.S., Hosseini, S.S.S., Gandomi, A.H.: Firefly algorithm for solving non-convex economic dispatch problems with valve loading effect. Appl. Soft Comput. 12(3), 1180–1186 (2012)
Adaniya, M.H.A.C., Abrão, T., Proença Jr., M.L.: Anomaly detection using metaheuristic firefly harmonic clustering. J. Netw. 8(1), 82–91 (2013)
Banati, H., Bajaj, M.: Performance analysis of firefly algorithm for data clustering. Int. J. Swarm Intell. 1(1), 19–35 (2013)
Senthilnath, J., Omkar, S.N., Mani, V.: Clustering using firefly algorithm: performance study. Swarm Evol. Comput. 1(3), 164–171 (2011)
Acknowledgments
Authors would like to thank the Malaysian Ministry of Higher Education for providing the financial support under the Fundamental Research Grant Scheme (s/o: 12894). Gratitude also goes to Universiti Utara Malaysia for helping in managing the study.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Mohammed, A.J., Yusof, Y., Husni, H. (2015). Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for Text Clustering. In: Badioze Zaman, H., et al. Advances in Visual Informatics. IVIC 2015. Lecture Notes in Computer Science(), vol 9429. Springer, Cham. https://doi.org/10.1007/978-3-319-25939-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-25939-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25938-3
Online ISBN: 978-3-319-25939-0
eBook Packages: Computer ScienceComputer Science (R0)