Abstract
Text clustering is widely used to create clusters of the digital documents. Selection of cluster centers plays an important role in the clustering. In this paper, we use artificial bee colony algorithm (ABC) to select appropriate cluster centers for creating clusters of the text documents. The ABC is a population-based nature-inspired algorithm, which simulates intelligent foraging behavior of the real honey bees and has been shown effective in solving many search and optimization problems. However, a major drawback of the algorithm is that it provides a good exploration of the search space at the cost of exploitation. In this paper, we improve search equation of the ABC and embed two local search paradigms namely chaotic local search and gradient search in the basic ABC to improve its exploitation capability. The proposed algorithm is named as chaotic gradient artificial bee colony. The effectiveness of the proposed algorithm is tested on three different benchmark text datasets namely Reuters-21,578, Classic4, and WebKB. The obtained results are compared with the ABC, a recent variant of the ABC namely gbest-guided ABC, a variant of the proposed methodology namely chaotic artificial bee colony, memetic ABC, and conventional clustering algorithm K-means. The empirical evaluation reveals very encouraging results in terms of the quality of solution and convergence speed.
Access this article
Rent this article via DeepDyve
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bansal JC, Sharma H, Arya K, Nagar A (2013) Memetic search in artificial bee colony algorithm. Soft Comput 17(10):1911–1928
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell
Bharti KK, Singh PK (2014a) A three-stage unsupervised dimension reduction method for text clustering. J Comput Sci 5(2):156–169
Bharti KK, Singh PK (2014b) Chaotic artificial bee colony for text clustering. In: Fourth international conference on emerging applications of information technology (EAIT-2014), ISI. IEEE Kolkata
Buckley C, Singhal A, Mitra M, Salton G (1995) New retrieval approaches using smart: TREC 4. In: Proceedings of the fourth text retrieval conference (TREC-4), pp 25–48
Chuang LY, Tsai SW, Yang CH (2011) Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst Appl 38(10):12699–12707
Cui X, Potok TE, Palathingal P (2005) Document clustering using particle swarm optimization. In: Proceedings of IEEE swarm intelligence symposium (SIS-2005). IEEE, pp 185–191
Cura T (2012) A particle swarm optimization approach to clustering. Expert Syst Appl 39(1):1582–1588
Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evolut Comput 1(1):3–18
Eberhart RC, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the 6th international symposium on micro machine and human science (MHS-1995), vol 1, New York, pp 39–43
Fei K, Junjie L, Haojin L, Zhenyue M, Qing X (2010) Improved artificial bee colony algorithm. IEEE, 2nd international workshop on intelligent systems and applications (ISA-2010), pp 1–4
Figueiredo F, Rocha L, Couto T, Salles T, Gonçalves MA, Meira W Jr (2011) Word co-occurrence features for text classification. Inf Syst 36(5):843–858
Gao W, Liu S, Huang L (2012) A global best artificial bee colony algorithm for global optimization. J Comput Appl Math 236(11):2741–2753
W Gao, S Liu, L Huang (2013) A novel artificial bee colony algorithm with powell’s method. Appl Soft Comput 13(9):3763–3775
Guo JQ, Zhou HF, Meng LQ (2009) Chaos particle swarm optimization algorithm for estimating solute transport parameters of streams from tracer experiment data. In: Fourth international conference on innovative computing, information and control (ICICIC-2009). IEEE, pp 872–875
Han J, Kamber M (2006) Data mining. Concepts and techniques, Southeast Asia edn. Morgan kaufmann, Waltham
Handl J, Meyer B (2007) Ant-based and swarm-based clustering. Swarm Intell 1(2):95–113
He D, He C, Jiang LG, Zhu HW, Hu GR (2001) Chaotic characteristics of a one-dimensional iterative map with infinite collapses. IEEE Trans Circuits Syst I: Fundam Theory Appl 48(7):900–906
Jadhav H, Roy R (2013) Gbest guided artificial bee colony algorithm for environmental/economic dispatch considering wind power. Expert Syst Appl 40(16):6385–6399
Jolliffe I (2005) Principal component analysis. Wiley Online Library
Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report TR06, Engineering faculty, Computer Engineering Department, Erciyes University Press, Erciyes
Karaboga D, Ozturk C (2011) A novel clustering approach: artificial bee colony (ABC) algorithm. Appl Soft Comput 11(1):652–657
Kaufman L, Rousseeuw P (1987) Clustering by means of medoids. North-Holland, Amsterdam
Kiefer J (1953) Sequential minimax search for a maximum. Proc Am Math Soc 4(3):502–506
Li C, Zhou J, Kou P, Xiao J (2012) A novel chaotic particle swarm optimization based fuzzy clustering algorithm. Neurocomputing 83:98–109
Liang Z (2010) Genetic enhancing chaotic particle swarm optimization algorithm. In: Proceedings of the 29th Chinese control conference (CCC-2010). IEEE, pp 5182–5187
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, California, vol 1, no 14, pp 281–297
Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recognit 33(9):1455–1465
Pantel P, Lin D (2002) Document clustering with committees. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 199–206
Powell MJD (1977) Restart procedures for the conjugate gradient method. Math Program 12(1):241–254
Reed JW, Jiao Y, Potok TE, Klump BA, Elmore MT, Hurson AR (2006) Tf-icf: a new term weighting scheme for clustering dynamic data streams. In: 5th International conference on machine learning and applications (ICMLA-2006). IEEE, pp 258–263
Robertson SE, Walker S (1999) Okapi/keenbow at trec-8. In: Text retrieval conference (TREC), vol 8, pp 151–162
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5):513–523
Sharma H, Bansal JC, Arya K (2013) Opposition based lévy flight artificial bee colony. Memet Comput 5(3):213–227
Sharma TK, Pant M, Singh VP (2012) Improved local search in artificial bee colony using golden section search. J Eng 1(1):14–19
Tan PN, Steinbach M, Kumar V (2005) Introduction to Data Mining. Addison Wesley, Upper Saddle River
Uğuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl Based Syst 24(7):1024–1032
Umeno K, Kitayama K (1999) Spreading sequences using periodic orbits of chaos for CDMA. Electron Lett 35(7):545–546
Zhu G, Kwong S (2010) Gbest-guided artificial bee colony algorithm for numerical function optimization. Appl Math Comput 217(7):3166–3173
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Bharti, K.K., Singh, P.K. Chaotic gradient artificial bee colony for text clustering. Soft Comput 20, 1113–1126 (2016). https://doi.org/10.1007/s00500-014-1571-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-014-1571-7