Skip to main content
Log in

Solving large p-median clustering problems by primal–dual variable neighborhood search

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Data clustering methods are used extensively in the data mining literature to detect important patterns in large datasets in the form of densely populated regions in a multi-dimensional Euclidean space. Due to the complexity of the problem and the size of the dataset, obtaining quality solutions within reasonable CPU time and memory requirements becomes the central challenge. In this paper, we solve the clustering problem as a large scale p-median model, using a new approach based on the variable neighborhood search (VNS) metaheuristic. Using a highly efficient data structure and local updating procedure taken from the OR literature, our VNS procedure is able to tackle large datasets directly without the need for data reduction or sampling as employed in certain popular methods. Computational results demonstrate that our VNS heuristic outperforms other local search based methods such as CLARA and CLARANS even after upgrading these procedures with the same efficient data structures and local search. We also obtain a bound on the quality of the solutions by solving heuristically a dual relaxation of the problem, thus introducing an important capability to the solution process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Erlenkotter D (1978) A dual-based procedure for uncapacitated facility location. Oper Res 26: 992–1009

    Article  MathSciNet  MATH  Google Scholar 

  • Hansen P, Mladenović N (1997) Variable neighborhood search for the p-median. Locat Sci 5: 207–226

    Article  MATH  Google Scholar 

  • Hansen P, Mladenović N (2001) Variable neighborhood search: principles and applications. Eur J Oper Res 130: 449–467

    Article  MATH  Google Scholar 

  • Hansen P, Mladenović N (2001) Perez-Brito D variable neighborhood decomposition search. J Heuristics 7: 335–350

    Article  MATH  Google Scholar 

  • Hansen P, Brimberg J, Urosevic D, Mladenovic N (2007) Primal–dual variable neighbourhood for the simple plant location problem. INFORMS J Comput 19: 552–564

    Article  MathSciNet  Google Scholar 

  • Hansen P, Brimberg J, Urosevic D, Mladenovic N (2007) Data clustering using large p-median models and primal–dual variable neighborhood search, Les Cahiers du GERAD, G-2007-41. Montreal, Canada

  • Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley-Interscience, New York (Series in Applied Probability and Statistics)

  • Klose A (1995) A comparison between the Erlenkotter algorithm and a branch and bound algorithm based on subgradient optimization to solve the uncapacitated facility-location problem. In: Derigs U, Bachem A, Drexl A et al (eds) Operations research proceedings. Springer, Berlin, pp 335–339

    Google Scholar 

  • Kochetov Yu, Ivanenko D (2005) Computationally difficult instances for the uncapacitated facility location problem. In: Ibaraki T et al (eds) Metaheuristics: progress as real solvers. Operations research/computer science interfaces series, vol 32. Springer, New York, pp 351–367

    Google Scholar 

  • Mladenović N, Hansen P (1997) Variable neighborhood search. Comput Oper Res 24: 1097–1100

    Article  MathSciNet  MATH  Google Scholar 

  • Mladenović N, Brimberg J, Hansen P, Moreno-Perez J (2007) The p-median problem: a survey of metaheuristic approaches. Eur J Oper Res 179: 927–939

    Article  MATH  Google Scholar 

  • Ng R, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of 20th conference very large databases, pp 144–155

  • Ng R, Han J (2002) CLARANS: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14: 1003–1016

    Article  Google Scholar 

  • Reinelt G (1991) TSP-LIB a travelling salesman library. ORSA J Comput 3: 376–384

    MATH  Google Scholar 

  • Resende MGC, Werneck R (2007) A fast swap-based local search procedure for location problems. Ann Oper Res 150: 205–230

    Article  MathSciNet  MATH  Google Scholar 

  • Resende MGC, Werneck R (2004) A hybrid heuristic for the p-median problem. J Heuristics 10: 59–88

    Article  MATH  Google Scholar 

  • Teitz MB, Bart P (1968) Heuristic methods for estimating the generalized vertex median of a weighted graph. Oper Res 16: 955–961

    Article  MATH  Google Scholar 

  • Whitaker R (1983) A fast algorithm for the greedy interchange for large-scale clustering and median location problems. INFOR 21: 95–108

    MATH  Google Scholar 

  • Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, pp 103–114

  • Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Discov 1: 141–182

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dragan Urošević.

Additional information

Responsible editor: Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hansen, P., Brimberg, J., Urošević, D. et al. Solving large p-median clustering problems by primal–dual variable neighborhood search. Data Min Knowl Disc 19, 351–375 (2009). https://doi.org/10.1007/s10618-009-0135-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-009-0135-4

Keywords

Navigation