Abstract
Clustering divides data into meaningful or useful groups (clusters) without any prior knowledge. It is a key technique in data mining and has become an important issue in many fields. This article presents a new clustering algorithm based on the mechanism analysis of Bacterial Foraging (BF). It is an optimization methodology for clustering problem in which a group of bacteria forage to converge to certain positions as final cluster centers by minimizing the fitness function. The quality of this approach is evaluated on several well-known benchmark data sets. Compared with the popular clustering method named k-means algorithm, ACO-based algorithm and the PSO-based clustering technique, experimental results show that the proposed algorithm is an effective clustering technique and can be used to handle data sets with various cluster sizes, densities and multiple dimensions.
Similar content being viewed by others
References
Arthur, D., & Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In N. Bansal, K. Pruhs, & C. Stein (Eds.), Proc. of the eighteenth anual ACMSIAM symposium on discrete algorithms, SODA (pp. 1027–1035).
Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms (pp. 95–107). New York: Plenum Press.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.
Dhillon, I. S., Guan, Y., & Kulis, B. (2005). A unified view of kernel k-means, spectral clustering and graph partitioning. Technical Report TR-04–25, UTCS.
Dhillon, I. S., Guan, Y., & Kulis, B. (2007). Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(11), 1944–1957.
Dorigo, M., & Maniezzo, V. (1996). Ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 26(1), 29–41.
Englebrecht, A. P. (2002). Computational intelligence: An introduction. New York: Wiley.
Filippone, M., Camastra, F., Masulli, F., & Rovetta, S. (2008). A survey on spectral and kernel methods for clustering. Pattern Recognition, 41(1), 176–190.
Guha, S., Rastogi, R., & Shim, K. (1998). Cure: An efficient clustering algorithm for large databases. In Proceedings of ACM SIGMOD conference on management of data (pp. 73–84).
Guney, K., & Basbug, S. (2008). Interference suppression of linear antenna arrays by amplitude-only control using a bacterial foraging algorithm. Progress in Electromagnetics Research, 79, 475–497.
Handl, J, & Knowles, J. (2008). Cluster generators: synthetic data for the evaluation of clustering algorithms. http://dbkgroup.org/handl/generators/.
Handl, J., Knowles, J., & Dorigo, M. (2006). Ant-based clustering and topographic mapping. Artificial Life, 12(1), 35–62.
Hinneburg, A., & Keim, D. (1998). An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the 4th international conference on knowledge discovery and data mining (KDD-98) (pp. 58–65).
Hruschka, E., Campello, R., & de Castro, L. (2006). Evolving clusters in gene-expression data. Information Sciences, 176(13), 1898–1927.
Jain, A. K., Murty, M. N., & Flyn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323.
Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2004). A local search approximation algorithm for k-means clustering. Computational Geometry, 28(2–3), 89–112.
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the IEEE international joint conference on neural networks (ICW) (Vol. 4, pp. 1942–1948). Perth, Australia.
Kim, D. H., Abraham, A., & Cho, J. H. (2007). A hybrid genetic algorithm and bacterial foraging approach for global optimization. Information Sciences, 177(18), 3918–3937.
Kim, D. H., & Cho, J. H. (2005). Bacterial foraging based neural network fuzzy learning (pp. 2030–2036). IICAI.
Li, L., Yang, Y., Peng, H., & Wang, X. (2006). An optimization method inspired by chaotic ant behavior. International Journal of Bifurcation and Chaos, 16, 2351–2364.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley symposium on mathematical statistics and probability (pp. 281–297).
Mishra, S., & Bhende, C. N. (2007). Bacterial foraging technique-based optimized active power filter for load compensation. IEEE Transactions on Power Delivery, 22(1), 457–465.
Ng, R. T., & Han, J. (1994). Efficient and effective clustering methods for spatial data mining. In Proceedings of the 20th international conference on very large data bases conference (pp. 144–155).
Pal, S. K., Ghosh, A., & Uma Shankar, B. (2000). Segmentation of remotely sensed images with fuzzy thresholding and quantitative evaluation. International Journal on Remote Sensing, 21(11), 2269–2300.
Passino, K. M. (2002). Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Systems Magazine, 22(3), 52–67.
Sheikholeslami, G., Chatterjee, S., & Zhang, A. D. (1998). WaveCluster: A multi-resolution clustering approach for very large spatial databases. In Proceedings of the 24th international conference on very large data bases (pp. 428–439).
Shelokar, P. S., Jayaraman, V. K., & Kulkarni, B. D. (2004). An ant colony approach for clustering. Analytica Chimica Acta, 509, 187–195.
Tan, P.-N., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Reading, MA: Addison-Wesley.
Theodoridis, S., & Koutroumbas, K. (2006). Pattern recognition 3rd ed. New York: Academic.
UCI Machine Learning Repository (2007). http://archive.ics.uci.edu/ml/index.html. Univ. of California, Irvine, Dept. of Information and Computer Science, Center for Machine Learning and Intelligent Systems.
van den Bergh, F. (2002). An analysis of particle swarm optimizers. PhD Thesis, Department of Computer Science, University of Pretoria, Pretoria, South Africa.
van der Merwe, D. W., & Engelbrecht, A. P. (2003). Data clustering using particle swarm optimization. In Proceedings of IEEE congress on evolutionary computation (pp. 215–220).
Wan, M., Li, L., Xiao, J., Yang, Y.,Wang, C., & Guo, X. (2010). CAS based clustering algorithm for web users. Nonlinear Dynamics, 61(3), 347–361.
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.
Zhang, J., & Leung, Y. (2004). Improved possibilistic C-means clustering algorithms. IEEE Transactions on Fuzzy Systems, 12(2), 209–217.
Acknowledgements
I would like to thank the editor and all the reviewers for their great supports to our work. Our study is also supported by the National Basic Research Program of China (973 Program) (2007CB311203), the National Natural Science Foundation of China (Grant No. 60805043, 60821001), the Beijing Natural Science Foundation (Grant No. 4092029), the Huo Ying-Dong Education Foundation of China (Grant No. 121062), and the Foundation for the Author of National Excellent Doctoral Dissertation of PR China (FANEDD) (Grant No. 200951).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wan, M., Li, L., Xiao, J. et al. Data clustering using bacterial foraging optimization. J Intell Inf Syst 38, 321–341 (2012). https://doi.org/10.1007/s10844-011-0158-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-011-0158-3