Abstract
Data clustering methods are used extensively in the data mining literature to detect important patterns in large datasets in the form of densely populated regions in a multi-dimensional Euclidean space. Due to the complexity of the problem and the size of the dataset, obtaining quality solutions within reasonable CPU time and memory requirements becomes the central challenge. In this paper, we solve the clustering problem as a large scale p-median model, using a new approach based on the variable neighborhood search (VNS) metaheuristic. Using a highly efficient data structure and local updating procedure taken from the OR literature, our VNS procedure is able to tackle large datasets directly without the need for data reduction or sampling as employed in certain popular methods. Computational results demonstrate that our VNS heuristic outperforms other local search based methods such as CLARA and CLARANS even after upgrading these procedures with the same efficient data structures and local search. We also obtain a bound on the quality of the solutions by solving heuristically a dual relaxation of the problem, thus introducing an important capability to the solution process.
Similar content being viewed by others
References
Erlenkotter D (1978) A dual-based procedure for uncapacitated facility location. Oper Res 26: 992–1009
Hansen P, Mladenović N (1997) Variable neighborhood search for the p-median. Locat Sci 5: 207–226
Hansen P, Mladenović N (2001) Variable neighborhood search: principles and applications. Eur J Oper Res 130: 449–467
Hansen P, Mladenović N (2001) Perez-Brito D variable neighborhood decomposition search. J Heuristics 7: 335–350
Hansen P, Brimberg J, Urosevic D, Mladenovic N (2007) Primal–dual variable neighbourhood for the simple plant location problem. INFORMS J Comput 19: 552–564
Hansen P, Brimberg J, Urosevic D, Mladenovic N (2007) Data clustering using large p-median models and primal–dual variable neighborhood search, Les Cahiers du GERAD, G-2007-41. Montreal, Canada
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley-Interscience, New York (Series in Applied Probability and Statistics)
Klose A (1995) A comparison between the Erlenkotter algorithm and a branch and bound algorithm based on subgradient optimization to solve the uncapacitated facility-location problem. In: Derigs U, Bachem A, Drexl A et al (eds) Operations research proceedings. Springer, Berlin, pp 335–339
Kochetov Yu, Ivanenko D (2005) Computationally difficult instances for the uncapacitated facility location problem. In: Ibaraki T et al (eds) Metaheuristics: progress as real solvers. Operations research/computer science interfaces series, vol 32. Springer, New York, pp 351–367
Mladenović N, Hansen P (1997) Variable neighborhood search. Comput Oper Res 24: 1097–1100
Mladenović N, Brimberg J, Hansen P, Moreno-Perez J (2007) The p-median problem: a survey of metaheuristic approaches. Eur J Oper Res 179: 927–939
Ng R, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of 20th conference very large databases, pp 144–155
Ng R, Han J (2002) CLARANS: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14: 1003–1016
Reinelt G (1991) TSP-LIB a travelling salesman library. ORSA J Comput 3: 376–384
Resende MGC, Werneck R (2007) A fast swap-based local search procedure for location problems. Ann Oper Res 150: 205–230
Resende MGC, Werneck R (2004) A hybrid heuristic for the p-median problem. J Heuristics 10: 59–88
Teitz MB, Bart P (1968) Heuristic methods for estimating the generalized vertex median of a weighted graph. Oper Res 16: 955–961
Whitaker R (1983) A fast algorithm for the greedy interchange for large-scale clustering and median location problems. INFOR 21: 95–108
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, pp 103–114
Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Discov 1: 141–182
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Charu Aggarwal.
Rights and permissions
About this article
Cite this article
Hansen, P., Brimberg, J., Urošević, D. et al. Solving large p-median clustering problems by primal–dual variable neighborhood search. Data Min Knowl Disc 19, 351–375 (2009). https://doi.org/10.1007/s10618-009-0135-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-009-0135-4