Abstract
Document clustering is an important technique that has been widely employed in Information Retrieval (IR). Various clustering techniques have been reported, but the effectiveness of most techniques relies on the initial value of k clusters. Such an approach may not be suitable as we may not have prior knowledge on the collection of documents. To date, there are various swarm based clustering techniques proposed to address such problem, including this paper that explores the adaptation of Firefly Algorithm (FA) in document clustering. We extend the work on Gravitation Firefly Algorithm (GFA) by introducing a relocate mechanism that relocates assigned documents, if necessary. The newly proposed clustering algorithm, known as GFA_R, is then tested on a benchmark dataset obtained from the 20Newsgroups. Experimental results on external and relative quality metrics for the GFA_R is compared against the one obtained using the standard GFA and Bisect K-means. It is learned that by extending GFA to becoming GFA_R, a better quality clustering is obtained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sayed, A., Hacid, H., Zighed, D.: Exploring Validity Indices for Clustering Textual Data. In: Zighed, D.A., Tsumoto, S., Ras, Z.W., Hacid, H. (eds.) Mining Complex Data. SCI, vol. 165, pp. 281–300. Springer, Heidelberg (2009)
Miner, G., Elder, J., Fast, A., Hill, T., Nisbet, R., Delen, D.: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications, 1st edn. Elsevier (2012)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A review. ACM Comput. Surv. 31(3), 264–323 (1999)
Aliguliyev, R.M.: Clustering of Document Collection-A Weighted Approach. Expert Systems with Applications 36(4), 7904–7916 (2009)
Luo, C., Li, Y., Chung, S.M.: Text Document Clustering based on Neighbors. Data and Knowledge Engineering 68(11), 1271–1288 (2009)
Jain, A.K.: Data Clustering: 50 years beyond K-means. Pattern Recognition Letters 31(8), 651–666 (2010)
Gil-Garicia, R., Pons-Porrata, A.: Dynamic Hierarchical Algorithms for Document Clustering. Pattern Recognition Letters 31(6), 469–477 (2010)
Forsati, R., Mahdavi, M., Shamsfard, M., Meybodi, M.R.: Efficient Stochastic Algorithms for Document Clustering. Information Sciences 220, 269–291 (2013)
Kashef, R., Kamel, M.S.: Enhanced Bisecting K-means Clustering using Intermediate Cooperation. Pattern Recognition 42(11), 2557–2569 (2009)
Yujian, L., Liye, X.: Unweighted Multiple Group Method with Arithmetic Mean. In: The IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA), pp. 830–834 (2010)
Tan, S.C., Ting, K.M., Teng, S.W.: A general stochastic clustering method for automatic cluster discovery. Pattern Recognition 44(10-11), 2786–2799 (2011)
Saka, E., Nasraoui, O.: On Dynamic Data Clustering and Visualization using Swarm Intelligence. In: 2010 IEEE The 26th International Conference on Data Engineering Workshops (ICDEW), pp. 337–340 (2010)
He, Y., Hui, S.C., Sim, Y.: A Novel Ant-Based Clustering Approach for Document Clustering. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 537–544. Springer, Heidelberg (2006)
Zaw, M.M., Mon, E.E.: Web Document Clustering Using Cuckoo Clustering Algorithm based on Levy Flight. International Journal of Innovation and Applied Studies 4(1), 182–188 (2013)
Cui, X., Potok, T.E., Palathingal, P.: Document Clustering using Particle Swarm Optimization. In: Proceedings of the 2005 IEEE Swarm Intelligence Symposium, SIS 2005, pp. 185–191 (2005)
Yang, X.S.: Nature-inspired Metaheuristic Algorithms, 2nd edn. Luniver Press, United Kingdom (2010)
Yang, X.S.: Firefly Algorithm, Stochastic Test Functions and Design Optimization. Int. J. Bio-Inspired Computation 2(2), 78–84 (2010)
Yang, X.S., He, X.: Firefly Algorithm: Recent Advances and Applications. Int. J. Swarm Intelligence 1(1), 36–50 (2013)
Mohammed, A.J., Yusof, Y., Husni, H.: A Newton’s Universal Gravitation Inspired Firefly Algorithm for Document Clustering. In: Jeong, H.Y., Obaidat, M.S., Yen, N.Y., Park, J.J. (eds.) Advanced in Computer Science and Its Applications. LNEE, vol. 279, pp. 1259–1264. Springer, Heidelberg (2014)
Murugesan, K., Zhang, J.: Hybrid Bisect K-means Clustering Algorithm. In: IEEE International Conference on Business Computing and Global Informatization (BCGIN), pp. 216–219. IEEE (2011)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proc. KDD Workshop on Text Mining, Boston (2000)
20 Newsgroup Data Set, http://people.csail.mit.edu/20Newsgroup/
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 1 ed. Cambridge University Press (2008)
Shannon, C.E.: A Mathematical theory of communication. Bell System Technical Journal 27, 379–423, 623–656 (1948)
Das, S., Abraham, A., Konar, A.: Metaheuristic Clustering. Springer, Heidelberg (2009)
Youssef, S.M.: A New Hybrid Evolutionary-based Data Clustering Using Fuzzy Particle Swarm Optimization. In: The 23rd IEEE International Conference on Tools with Artificial Intelligence, pp. 717–724 (2011)
Hu, G., Zhou, S., Guan, J., Hu, X.: Towards Effective Document Clustering: A Constrained K-means Based Approach. Information Processing & Management 44(4), 1397–1409 (2008)
Lu, Y., Wang, S., Li, S., Zhou, C.: Text Clustering via Particle Swarm Optimization. In: The Swarm Intelligence Symposium, pp. 45–51. IEEE (2009)
Tang, R., Fong, S., Yang, X.S., Deb, S.: Integrating Nature-Inspired Optimization Algorithms to K-means Clustering. In: Proceedings of the 7th International Conference on Digital Information Management (ICDIM), pp. 116–123. IEEE, Macau (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Mohammed, A.J., Yusof, Y., Husni, H. (2014). Nature Inspired Data Mining Algorithm for Document Clustering in Information Retrieval. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-12844-3_33
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12843-6
Online ISBN: 978-3-319-12844-3
eBook Packages: Computer ScienceComputer Science (R0)