Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for Text Clustering

Mohammed, Athraa Jasim; Yusof, Yuhanis; Husni, Husniza

doi:10.1007/978-3-319-25939-0_2

Athraa Jasim Mohammed^20,21,
Yuhanis Yusof²⁰ &
Husniza Husni²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9429))

Included in the following conference series:

International Visual Informatics Conference

1303 Accesses
1 Citations

Abstract

Text mining, in particular the clustering is mostly used by search engines to increase the recall and precision of a search query. The content of online websites (text, blogs, chats, news, etc.) are dynamically updated, nevertheless relevant information on the changes made are not present. Such a scenario requires a dynamic text clustering method that operates without initial knowledge on a data collection. In this paper, a dynamic text clustering that utilizes Firefly algorithm is introduced. The proposed, aFA_merge, clustering algorithm automatically groups text documents into the appropriate number of clusters based on the behavior of firefly and cluster merging process. Experiments utilizing the proposed aFA_merge were conducted on two datasets; 20Newsgroups and Reuter’s news collection. Results indicate that the aFA_merge generates a more robust and compact clusters than the ones produced by Bisect K-means and practical General Stochastic Clustering Method (pGSCM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hybrid clustering analysis using improved krill herd algorithm

Article 23 May 2018

A New Hybrid Approach for Document Clustering Using Tabu Search and Particle Swarm Optimization (TSPSO)

A novel hybrid multi-verse optimizer with K-means for text documents clustering

Article 11 May 2020

References

Sayed, A., Hacid, H., Zighed, D.: Exploring validity indices for clustering textual data. In: Zighed, D.A., Tsumoto, S., Ras, Z.W., Hacid, H. (eds.) Mining Complex Data. Studies in Computational Intelligence, vol. 165, pp. 281–300. Springer, Heidelberg (2009)
Chapter Google Scholar
Miner, G., Elder, J., Fast, A., Hill, T., Nisbet, R., Delen, D.: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications, 1st edn. Elsevier, Amsterdam (2012)
Google Scholar
Zhang, L., Cao, Q., Lee, J.: A novel ant-based clustering algorithm using Renyi entropy. Appl. Soft Comput. 13(5), 2643–2657 (2013)
Article Google Scholar
Murugesan, K, Zhang, J.: Hybrid bisect K-means clustering algorithm. In: IEEE International Conference on Business Computing and Global Informatization (BCGIN), pp. 216–219. IEEE (2011)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: proceedings of KDD Workshop on Text Mining, Boston (2000)
Google Scholar
Tan, S.C., Ting, K.M., Teng, S.W.: A general stochastic clustering method for automatic cluster discovery. Pattern Recogn. 44(10–11), 2786–2799 (2011)
Google Scholar
Feng, L., Qiu, M.H., Wang, Y.X., Xiang, Q.L., Yang, Y.F.: Fast divisive clustering algorithm using an improved discrete particle swarm optimizer. Pattern Recogn. Lett. 31, 1216–1225 (2010)
Article Google Scholar
Kashef, R., Kamel, M.S.: Enhanced bisecting K-means clustering using intermediate cooperation. Pattern Recogn. 42(11), 2557–2569 (2009)
Article MATH Google Scholar
Yujian, L., Liye, X.: Unweighted multiple group method with arithmetic mean. In: the IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA), pp. 830–834 (2010)
Google Scholar
Yin, Y., Kaku, I., Tang, J., Zhu, J.: Data Mining Concepts, Methods and Application in Management and Engineering Design. Springer, London (2011)
Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, New York, Addition Wesley, Boston (2006)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 1st edn. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Gil-Garicia, R., Pons-Porrata, A.: Dynamic hierarchical algorithms for document clustering. Pattern Recogn. Lett. 31(6), 469–477 (2010)
Article Google Scholar
Picarougne, F., Azzag, H., Venturini, G., Guinot, C.: A new approach of data clustering using a flock of agents. Evol. Comput. 15(3), 345–367 (2007)
Article Google Scholar
Tan, S.C., Ting, K.M., Teng, S.W.: Simplifying and improving ant-based clustering. Procedia Comput. Sci. 4, 46–55 (2011)
Article Google Scholar
Yang, X.S.: Firefly algorithm, stochastic test functions and design optimization. Int. J. Bio-Inspired Comput. 2(2), 78–84 (2010)
Article Google Scholar
Yang, X.S., He, X.: Firefly algorithm: recent advances and applications. Int. J. Swarm Intell. 1(1), 36–50 (2013)
Article Google Scholar
Mohammed, A.J., Yusof, Y., Husni, H.: Document clustering based on firefly algorithm. J. Comput. Sci. 11(3), 453–465 (2015)
Article Google Scholar
Newsgroup Data Set (2006). http://people.csail.mit.edu/20Newsgroup/
Lewis, D.: The reuters-21578 text categorization test collection (1999). http://kdd.ics.uci.edu/database/reuters21578/reuters21578.html
Forsati, R., Mahdavi, M., Shamsfard, M., Meybodi, M.R.: Efficient stochastic algorithms for document clustering. Inf. Sci. 220, 269–291 (2013)
Article MathSciNet Google Scholar
Hatamlou, A., Abdullah, S., Nezamabadi-pour, H.: A combined approach for clustering based on K-means and gravitational search algorithms. Swarm Evol. Comput. 6, 47–52 (2012)
Article Google Scholar
Yang, X.S., Hosseini, S.S.S., Gandomi, A.H.: Firefly algorithm for solving non-convex economic dispatch problems with valve loading effect. Appl. Soft Comput. 12(3), 1180–1186 (2012)
Article Google Scholar
Adaniya, M.H.A.C., Abrão, T., Proença Jr., M.L.: Anomaly detection using metaheuristic firefly harmonic clustering. J. Netw. 8(1), 82–91 (2013)
Google Scholar
Banati, H., Bajaj, M.: Performance analysis of firefly algorithm for data clustering. Int. J. Swarm Intell. 1(1), 19–35 (2013)
Article Google Scholar
Senthilnath, J., Omkar, S.N., Mani, V.: Clustering using firefly algorithm: performance study. Swarm Evol. Comput. 1(3), 164–171 (2011)
Article Google Scholar

Download references

Acknowledgments

Authors would like to thank the Malaysian Ministry of Higher Education for providing the financial support under the Fundamental Research Grant Scheme (s/o: 12894). Gratitude also goes to Universiti Utara Malaysia for helping in managing the study.

Author information

Authors and Affiliations

School of Computing, College of Arts and Sciences, Universiti Utara Malaysia, 06010, Sintok, Kedah, Malaysia
Athraa Jasim Mohammed, Yuhanis Yusof & Husniza Husni
University of Technology, Baghdad, Iraq
Athraa Jasim Mohammed

Authors

Athraa Jasim Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
Yuhanis Yusof
View author publications
You can also search for this author in PubMed Google Scholar
Husniza Husni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Athraa Jasim Mohammed .

Editor information

Editors and Affiliations

Fac Info Science and Techn, Universiti Kebangsaan Malaysia, Selangor, Malaysia
Halimah Badioze Zaman
University of Cambridge, Cambridge, United Kingdom
Peter Robinson
Center for Digital Video Process, Dublin 9, Ireland
Alan F. Smeaton
Computer Science and Information Enginee, National Central University, Jhongli City, Taiwan
Timothy K. Shih
Kingston University, Kingston upon Thames, United Kingdom
Sergio Velastin
Universiti Kebangsaan Malaysia, Institute of Visual Informatics, Bangi, Malaysia
Azizah Jaafar
Universiti Kebangsaan Malaysia, Institute of Visual Informatics, Bangi, Malaysia
Nazlena Mohamad Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mohammed, A.J., Yusof, Y., Husni, H. (2015). Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for Text Clustering. In: Badioze Zaman, H., et al. Advances in Visual Informatics. IVIC 2015. Lecture Notes in Computer Science(), vol 9429. Springer, Cham. https://doi.org/10.1007/978-3-319-25939-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-25939-0_2
Published: 27 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25938-3
Online ISBN: 978-3-319-25939-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics