Hybrid heuristics based on column generation with path-relinking for clustering problems

doi:10.1016/j.eswa.2014.03.008

Expert Systems with Applications

Volume 41, Issue 11, 1 September 2014, Pages 5277-5284

https://doi.org/10.1016/j.eswa.2014.03.008 Get rights and content

Highlights

•
The paper examines hybrid heuristics for solving clustering problems.
•
Methods are based on the application of a column generation technique for solving p-median problems.
•
Five heuristics are tested with CRand indexes.
•
Computational results are compared with recent methods in literature.

Abstract

This paper examines hybrid heuristics for solving clustering problems. The clustering problem can be defined as the process of separating a set of objects into groups such that members of a group are similar to each other. The methods are based on the application of a column generation technique for solving p-medians problems. Five heuristics are derived directly from the column generation algorithm: a solution made feasible from the master problem, the column generation solution, a heuristic with path-relinking considering the initial columns of the column generation procedure, a solution of the master problem with path-relinking and the column generation process with path-relinking. Solutions are tested with the external measure CRand and the computational results compared to recent methods in literature.

Introduction

The clustering problem is defined as the process of separating a set of objects into groups such that members of a group are similar to each other (Lorena & Furtado, 2001). The determination of similarity between individuals depends on the metric defined for the distance between objects. External measures are used to measure the similarity between sets, such as the Rand (Rand, 1971) and correct adjusted Rand (CRand) indexes (Hubert & Arabie, 1985).

The difficulty in solving the clustering problem comes from how to identify close objects and develop methods that fulfill the function of grouping them. This is not an easy task due to the great difficulty in finding good solutions, i.e., classify objects efficiently. The classification of data can be supervised or unsupervised (Abbasi & Younis, 2007). In supervised analysis the method is trained with known patterns of data to use it in new samples. In the unsupervised analysis, the algorithm seeks to find the data structures that allow the separation into groups without prior knowledge of the standards. Although methods of unsupervised classification get less accurate results than supervised methods, they are more suitable when no prior information about the groups is available.

The clustering problem examined in this paper is considered unsupervised, i.e., concerned with the grouping of related objects without information from class or label (Nascimento & De Carvalho, 2011). The clustering problem has been applied in a wide variety of research areas such as machine learning, artificial intelligence, pattern recognition, spatial data mining, image segmentation, genetics, microbiology, geology, remote sensing, among others (Xu and Wunsch, 2005, Jun et al., 2014). Authors have applied metaheuristics for resolution, such as greedy randomized adaptive search procedure (Nascimento, Toledo, & de Leon Ferreira de Carvalho, 2010) and genetic algorithms (Agusti’n-Blas et al., 2012).

This work proposes five hybrid heuristics to solve clustering problems. All of them are based on the application of a column generation technique for solving p-median problems (Senne, Lorena, & Pereira, 2007). The five approaches are: a solution made feasible from the master problem, the column generation solution, a heuristic with path-relinking considering the initial columns of the column generation procedure, a solution of the master problem with path-relinking and the column generation process with path-relinking. All clustering solutions are tested with the external measure CRand and the computational results compared to recent methods in literature.

The paper is organized as follows. Section 2 gives a brief literature review on the clustering problem. Section 3 presents an overview of column generation for p-median problems. Section 4 describes the hybrid heuristics for solving the clustering problem. Section 5 presents the data, the distances and correlations used to calculate the dissimilarity between samples, and the computational results. In Section 6 some conclusions are mentioned.

Section snippets

Abbreviated literature review

The clustering problem has been extensively studied. Rand (1971) proposes patterns that isolate aspects of performance of a method, such as, return, sensitivity and stability. These patterns depend on a similarity measure between two different clusters of the same set of data; the measure should essentially consider how each data point pair is assigned in each set.

Handl, Knowles, and Kell (2005) show the large amount of techniques available for validation of results obtained for the problem,

Column generation for p-median problems

The heuristics to solve clustering problems start from a set of data and without any information of standards build groups that have similar characteristics among its objects. The groups are obtained by a column generation technique proposed to solve p-median problems. P-median solutions minimize the sum of distances between nodes and their nearest facility (median) and it is expected that the characteristics between nodes that are allocated to the same facility are similar. Thus, a feasible

The hybrid heuristics

The hybrid heuristics can be classified as a combination of metaheuristics and column generation. Two basic possibilities are explored in literature: apply metaheuristics to the price subproblems or directly to the MP, first to create initial columns or to produce incoming columns to CG (Mauri and Lorena, 2007, Pirkwieser and Raidl, 2010, Filho and Lorena, 2000, Massen et al., 2013). The hybrid heuristics proposed in this work can be seen as a third option were the CG process generates

Computational results

This section presents the data and computational results used to validate the proposed methods. A total of 8 sets, namely: Iris, Yeast, Breast, BreastA, BreastB, Proteins, DLBCLA and DLBCLB. The protein data were obtained at the address http://ranger.uta.edu/chqding/protein. Data from Yeast, Breast and Iris were obtained in UCI repository (Abbasi & Younis, 2007). The data BreastA, BreastB, DLBCLA and DLBCLB are of the data repository program of cancer (//www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

Conclusion

Clustering problems appear in various contexts and real life applications. This work presents a contribution to the solution of clustering problems using hybrid heuristics that combine stages of a column generation process to solve p-median problems.

Five different approaches have been proposed, considering solutions from the initial columns, the master problem and the column generation. They are combined with a path-relinking approach and the results are evaluated by the CRand index. The

Acknowledgment

The authors thank FAPES (process 59830042/2012), CNPq (processes 476862/2012-4, 471837/2008-3, 300692/2009-9, 300747/2010-1 and 477148/2011-5) and CAPES.

References (25)

A.A. Abbasi et al.
A survey on clustering algorithms for wireless sensor networks
Computer Communications
(2007)
L.E. Agusti’n-Blas et al.
A new grouping genetic algorithm for clustering problems
Expert Systems with Applications
(2012)
D.-X. Chang et al.
A genetic algorithm with gene rearrangement for k-means clustering
Pattern Recognition
(2009)
Y. Hong et al.
Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm
Pattern Recognition
(2008)
S. Jun et al.
Document clustering method using dimension reduction and support vector clustering to overcome sparseness
Expert Systems with Applications
(2014)
S. Mitra et al.
Multi-objective evolutionary biclustering of gene expression data
Pattern Recognition
(2006)
M.C.V. Nascimento et al.
Spectral methods for graph clustering – a survey
European Journal Of Operational Research
(2011)
M.C.V. Nascimento et al.
Investigation of a new grasp-based clustering algorithm applied to biological data
Computers & Operations Research
(2010)
A.A. Chaves et al.
Clustering search algorithm for the capacitated centered clustering problem
Computers & Operations Research
(2010)
Filho, G. R., & Lorena, L. A. N. (2000). Constructive genetic algorithm and column generation: An application to graph...

G.R. Filho et al.

Uma heurística de geração de colunas para o problema de formação de células de máquinas e partes

Pesquisa Operacional para o Desenvolvimento

(2010)

J. Handl et al.

Computational cluster validation in post-genomic data analysis

Bioinformatics

(2005)

Cited by (6)

A hybrid heuristic for overlapping community detection through the conductance minimization
2022, Physica A: Statistical Mechanics and its Applications
Citation Excerpt :
An alternative is to combine heuristics with exact methods in order to produce high-quality solutions at a reasonable computational time. Such methods are known as hybrid heuristics or matheuristics [9], which lately have been successfully applied to optimization and clustering problems [5,10–13]. Then, the main contribution of this work is the proposal of a hybrid heuristic for detecting overlapping communities of a graph by the minimization of the conductance metric.
Community structures, which are sets of elements that share some relationship between themselves, can be found in several real-world networks. Many of these communities, also known as clusters, can share elements, i.e., they may overlap. Identifying such overlapping clusters is usually a harder task than finding non-overlapping ones and, therefore, it needs more sophisticated methods. In this work we proposed a hybrid heuristic for detecting overlapping clusters in networks. An overlapping clustering is generated through the solving of a mixed-integer linear program using, as input, a heterogeneous set of good-quality clusters. This set is produced by two state-of-the-art overlapping community detection algorithms. In addition, some local search methods for conductance minimization are used to improve the quality of the clustering generate by our hybrid heuristic. Test results in artificial and real-world graphs show that our approach is able to detect overlapping clusters with better overall conductance than methods in the state of the art.
Path relinking for the vertex separator problem
2017, Expert Systems with Applications
Citation Excerpt :
Recently, the population-based path relinking framework (Glover, 1998; Glover & Laguna, 1997) has attracted much attention in combinatorial optimization and intelligent problem solving. The approach has shown outstanding performances in solving a number of challenging decision and optimization problems in various settings, such as unconstrained binary quadratic optimization (Wang, Lü, Glover, & Hao, 2012), flow shop sequencing and scheduling (Costa, Goldbarg, & Goldbarg, 2012; Peng, Lü, & Cheng, 2015; Zeng, Basseur, & Hao, 2013), clustering (Martins de Oliveira, Nogueira Lorena, Chaves, & Mauri, 2014), web services composition (Parejo, Segura, Fernandez, & Ruiz-Cortés, 2014), frequency assignment (Lai & Hao, 2015) and quadratic multiple knapsack (Chen, Hao, & Glover, 2016). PR has also been combined with other metaheuristics such as genetic algorithms (Vallada & Ruiz, 2010), scatter search (González, Oddi, Rasconi, & Varela, 2015) and GRASP (Mestria, Ochi, & Martins, 2013) to solve several difficult combinatorial problems.
This paper presents the first population-based path relinking algorithm for solving the NP-hard vertex separator problem in graphs. The proposed algorithm employs a dedicated relinking procedure to generate intermediate solutions between an initiating solution and a guiding solution taken from a reference set of elite solutions (population) and uses a fast tabu search procedure to improve some selected intermediate solutions. Special care is taken to ensure the diversity of the reference set. Dedicated data structures based on bucket sorting are employed to ensure a high computational efficiency. The proposed algorithm is assessed on four sets of 365 benchmark instances with up to 20,000 vertices, and shows highly comparative results compared to the state-of-the-art methods in the literature. Specifically, we report improved best solutions (new upper bounds) for 67 instances which can serve as reference values for assessment of other algorithms for the problem.
A comparison of two hybrid methods for constrained clustering problems
2017, Applied Soft Computing Journal
Citation Excerpt :
This paper examines the hybridization of the column generation approach for p-median problems [30]. This method was used in Oliveira et al. [27] for clustering problem. In our paper, CG method was improved and added a local search to solve a new problem named constrained clustering problem.
This paper proposes two hybrid heuristics to solve the constrained clustering problem. This problem consists of partitioning a set of objects into clusters with similar members that satisfy must-link and cannot-link constraints. A must-link constraint indicates that two selected objects must be in the same cluster, and cannot-link constraint means that two selected objects must be in distinct clusters. The two proposed hybrid methods are biased random key genetic algorithm (BRKGA) with local search (LS) heuristic and column generation (CG) with path-relinking (PR) and local search (LS) heuristic. Computational experiments considering instances available in the literature are presented to demonstrate the efficacy of the proposed methods to solve the constrained clustering problem. Moreover, the results of the CG and BRKGA are compared with the CCCG, CP and CPRBBA method.
GRASP with path relinking for the selective pickup and delivery problem
2016, Expert Systems with Applications
Citation Excerpt :
Their results indicate that the latter option achieved the best results. Besides hybridizing path relinking with GRASP, one can also find path relinking hybridized with tabu search (Jia & Hu, 2014; Lai & Hao, 2015; Peng, Lü, & Cheng, 2015; Urrutia, Milanés, & Løkketangen, 2015), local search (Yang, Zhang, & Zhu, 2015), population-based metaheuristics (de Oliveira, Enayatifar, Sadaei, Guimarães, & Potvin, 2016; Hamdi-Dhaoui, Labadie, & Yalaoui, 2014; Marinakis & Marinaki, 2015; Martí, Corberán, & Peiró, 2015; Ribas, Companys, & Tort-Martorell, 2015) and mathematical programming based approaches (de Oliveira, Lorena, Chaves, & Mauri, 2014; Li, Chu, Prins, & Zhu, 2014). One important component with path relinking is the neighborhood operator for moving from the initial solution to the guiding solution.
Bike sharing systems are very popular nowadays. One of the characteristics is that bikes are picked up from some surplus bike stations and transported to all deficit bike stations by a repositioning vehicle with limited capacity to satisfy the demand of deficit bike stations. Motivated by this real world bicycle repositioning problem, we study the selective pickup and delivery problem, where demand at every delivery node has to be satisfied by the supply collected from a subset of pickup nodes. The objective is to minimize the total travel cost incurred from visiting the nodes. We present a GRASP with path-relinking for solving the described problem. Experimental results show that this simple heuristic improves the existing results in the literature with an average improvement of 5.72% using small computing times. The proposed heuristic can contribute to the development of effective and efficient algorithms for real world bicycle reposition operations.
Path relinking for the fixed spectrum frequency assignment problem
2015, Expert Systems with Applications
The fixed spectrum frequency assignment problem (FS-FAP) is a highly relevant application in modern wireless systems. This paper presents the first path relinking (PR) approach for solving FS-FAP. We devise four relinking operators to generate intermediate solutions (or paths) and a tabu search procedure for local optimization. We also adopt a diversity-and-quality technique to maintain population diversity. To show the effectiveness of the proposed approach, we present computational results on the set of 42 benchmark instances commonly used in the literature and compare them with the current best results obtained by any other existing methods. By showing improved best results (new upper bounds) for 19 instances, we demonstrate the effectiveness of the proposed PR approach. We investigate the impact of the relinking operators and the population updating strategy. The ideas of the proposed could be applicable to other frequency assignment problems and search problems.
Forming the clusters of labour migrants by the degree of risk of HIV infection
2016, Eastern-European Journal of Enterprise Technologies

View full text

Hybrid heuristics based on column generation with path-relinking for clustering problems

Highlights

Abstract

Introduction

Section snippets

Abbreviated literature review

Column generation for p-median problems

The hybrid heuristics

Computational results

Conclusion

Acknowledgment

Computer Communications

Expert Systems with Applications

Pattern Recognition

Pattern Recognition

Expert Systems with Applications

Pattern Recognition

European Journal Of Operational Research

Computers & Operations Research

Clustering search algorithm for the capacitated centered clustering problem

Computers & Operations Research

Uma heurística de geração de colunas para o problema de formação de células de máquinas e partes

Pesquisa Operacional para o Desenvolvimento

Computational cluster validation in post-genomic data analysis

Bioinformatics