Skip to main content

Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for Text Clustering

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9429))

Abstract

Text mining, in particular the clustering is mostly used by search engines to increase the recall and precision of a search query. The content of online websites (text, blogs, chats, news, etc.) are dynamically updated, nevertheless relevant information on the changes made are not present. Such a scenario requires a dynamic text clustering method that operates without initial knowledge on a data collection. In this paper, a dynamic text clustering that utilizes Firefly algorithm is introduced. The proposed, aFAmerge, clustering algorithm automatically groups text documents into the appropriate number of clusters based on the behavior of firefly and cluster merging process. Experiments utilizing the proposed aFAmerge were conducted on two datasets; 20Newsgroups and Reuter’s news collection. Results indicate that the aFAmerge generates a more robust and compact clusters than the ones produced by Bisect K-means and practical General Stochastic Clustering Method (pGSCM).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Sayed, A., Hacid, H., Zighed, D.: Exploring validity indices for clustering textual data. In: Zighed, D.A., Tsumoto, S., Ras, Z.W., Hacid, H. (eds.) Mining Complex Data. Studies in Computational Intelligence, vol. 165, pp. 281–300. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Miner, G., Elder, J., Fast, A., Hill, T., Nisbet, R., Delen, D.: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications, 1st edn. Elsevier, Amsterdam (2012)

    Google Scholar 

  3. Zhang, L., Cao, Q., Lee, J.: A novel ant-based clustering algorithm using Renyi entropy. Appl. Soft Comput. 13(5), 2643–2657 (2013)

    Article  Google Scholar 

  4. Murugesan, K, Zhang, J.: Hybrid bisect K-means clustering algorithm. In: IEEE International Conference on Business Computing and Global Informatization (BCGIN), pp. 216–219. IEEE (2011)

    Google Scholar 

  5. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  6. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: proceedings of KDD Workshop on Text Mining, Boston (2000)

    Google Scholar 

  7. Tan, S.C., Ting, K.M., Teng, S.W.: A general stochastic clustering method for automatic cluster discovery. Pattern Recogn. 44(10–11), 2786–2799 (2011)

    Google Scholar 

  8. Feng, L., Qiu, M.H., Wang, Y.X., Xiang, Q.L., Yang, Y.F.: Fast divisive clustering algorithm using an improved discrete particle swarm optimizer. Pattern Recogn. Lett. 31, 1216–1225 (2010)

    Article  Google Scholar 

  9. Kashef, R., Kamel, M.S.: Enhanced bisecting K-means clustering using intermediate cooperation. Pattern Recogn. 42(11), 2557–2569 (2009)

    Article  MATH  Google Scholar 

  10. Yujian, L., Liye, X.: Unweighted multiple group method with arithmetic mean. In: the IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA), pp. 830–834 (2010)

    Google Scholar 

  11. Yin, Y., Kaku, I., Tang, J., Zhu, J.: Data Mining Concepts, Methods and Application in Management and Engineering Design. Springer, London (2011)

    Google Scholar 

  12. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, New York, Addition Wesley, Boston (2006)

    Google Scholar 

  13. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 1st edn. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  14. Gil-Garicia, R., Pons-Porrata, A.: Dynamic hierarchical algorithms for document clustering. Pattern Recogn. Lett. 31(6), 469–477 (2010)

    Article  Google Scholar 

  15. Picarougne, F., Azzag, H., Venturini, G., Guinot, C.: A new approach of data clustering using a flock of agents. Evol. Comput. 15(3), 345–367 (2007)

    Article  Google Scholar 

  16. Tan, S.C., Ting, K.M., Teng, S.W.: Simplifying and improving ant-based clustering. Procedia Comput. Sci. 4, 46–55 (2011)

    Article  Google Scholar 

  17. Yang, X.S.: Firefly algorithm, stochastic test functions and design optimization. Int. J. Bio-Inspired Comput. 2(2), 78–84 (2010)

    Article  Google Scholar 

  18. Yang, X.S., He, X.: Firefly algorithm: recent advances and applications. Int. J. Swarm Intell. 1(1), 36–50 (2013)

    Article  Google Scholar 

  19. Mohammed, A.J., Yusof, Y., Husni, H.: Document clustering based on firefly algorithm. J. Comput. Sci. 11(3), 453–465 (2015)

    Article  Google Scholar 

  20. Newsgroup Data Set (2006). http://people.csail.mit.edu/20Newsgroup/

  21. Lewis, D.: The reuters-21578 text categorization test collection (1999). http://kdd.ics.uci.edu/database/reuters21578/reuters21578.html

  22. Forsati, R., Mahdavi, M., Shamsfard, M., Meybodi, M.R.: Efficient stochastic algorithms for document clustering. Inf. Sci. 220, 269–291 (2013)

    Article  MathSciNet  Google Scholar 

  23. Hatamlou, A., Abdullah, S., Nezamabadi-pour, H.: A combined approach for clustering based on K-means and gravitational search algorithms. Swarm Evol. Comput. 6, 47–52 (2012)

    Article  Google Scholar 

  24. Yang, X.S., Hosseini, S.S.S., Gandomi, A.H.: Firefly algorithm for solving non-convex economic dispatch problems with valve loading effect. Appl. Soft Comput. 12(3), 1180–1186 (2012)

    Article  Google Scholar 

  25. Adaniya, M.H.A.C., Abrão, T., Proença Jr., M.L.: Anomaly detection using metaheuristic firefly harmonic clustering. J. Netw. 8(1), 82–91 (2013)

    Google Scholar 

  26. Banati, H., Bajaj, M.: Performance analysis of firefly algorithm for data clustering. Int. J. Swarm Intell. 1(1), 19–35 (2013)

    Article  Google Scholar 

  27. Senthilnath, J., Omkar, S.N., Mani, V.: Clustering using firefly algorithm: performance study. Swarm Evol. Comput. 1(3), 164–171 (2011)

    Article  Google Scholar 

Download references

Acknowledgments

Authors would like to thank the Malaysian Ministry of Higher Education for providing the financial support under the Fundamental Research Grant Scheme (s/o: 12894). Gratitude also goes to Universiti Utara Malaysia for helping in managing the study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Athraa Jasim Mohammed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Mohammed, A.J., Yusof, Y., Husni, H. (2015). Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for Text Clustering. In: Badioze Zaman, H., et al. Advances in Visual Informatics. IVIC 2015. Lecture Notes in Computer Science(), vol 9429. Springer, Cham. https://doi.org/10.1007/978-3-319-25939-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25939-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25938-3

  • Online ISBN: 978-3-319-25939-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics